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PREFACE 


In  order  to  synthesize  and  effectively  treat  and  control  Air 
Force  air-to-air  and  air-to-ground  flight  vehicle  systems,  it  is 
absolutely  necessary  to  have  the  groundwork  contributed  to  in  large 
part  by  the  rather  important  results  in  this  report  whose  develop- 
ment was  motivated  by  these  important  Air  Force  flight  dynamic  vehicle 
issues . 
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SECTION  I 
INTRODUCTION 

1.  HISTORY  OF  DIFFERENTIAL  GAMES 

Differential  games  came  into  existence  in  1954  with  the  publi- 
cation of  several  RAND  Corporation  reports  written  by  Rufus 
1 -4 

Isaacs.  This  work,  and  a great  deal  more,  was  incorporated  into 

his  book  Differential  Games'’  published  in  1 965. 

The  original  impetus  for  his  investigations  was  the  problem 

of  pursuit  and  evasion  of  hostile  aircraft.  Following  his  original  work, 

the  field  of  differential  games  became  almost  entirely  the  province  of 

mathematicians  who  made  no  attempt  to  give  any  physical  meaning  to 

their  results.^’ At  the  same  time  that  American  mathematicians 

were  delving  into  the  field,  Soviet  mathematicians  adso  were  exploring 

it  (see  the  bibliography  of  Reference  9). 

Differential  games  became  better  known  to  engineers  in  the 

middle  1960s  with  publication  of  Differential  Games  and  its  review,'*^ 

11-15 

and  a small  tlurry  of  papers  in  engineering  journals.  Following 

these  came  another  small  flurry  of  papers  involving  stochastic  differ- 
ential games.'1’ Today  the  subject  represents  a rich  field  of 
investigation  to  those  interested  in  control  theory  and  the  quantifiable 
aspects  of  conflict. 


g 

\ 


1 


This  exceedingly  short  summary  is  by  no  means  exhaustive  of 
the  work  that  has  been  done. 

2>  THE  THEORY  OF  GAMES 

As  the  name  differential  games  implies,  it  is  a derivative  of 

the  mathematical  theory  of  games  first  developed  by  von  Neumann 
1 9 

and  Morgenstern.  Many  good  books  on  game  theory  (far  easier  to 

read  and  understand  than  is  Theory  of  Games  and  Economic  Behavior) 

..  , . 20-23 

are  available. 

Games  can  have  many  forms  depending  on  the  number  of 
players  and  the  way  in  which  winnings  and  losses  are  computed.  The 
work  herein  involves  only  two  players  where,  essentially,  the  loser 
pays  the  winner  a specified  amount  after  the  play  of  a given  game. 
Since  the  algebraic  sum  of  the  winner's  game  (positive)  and  loser's 
loss  (negative)  is  zero,  this  type  of  game  is  known  as  a two-player, 
zero  sum  game . 

Games  may  be  presented  either  in  extensive  form  — a set  of 
rules  and  a succession  of  choices  for  each  player,  or  in  normal 
form  — a matrix  or  function  which  relates  the  amount  due  to  the 
winner  to  the  choices  made  by  the  two  players.  The  amount,  as  a 
function  of  the  choices,  is  known  as  the  payoff  of  the  game. 

The  choices  may  involve  only  a limited  number  of  individual 
elements  — matrix  games  — or  they  may  involve  an  infinite  number  of 
elements  — infinite  or  continuous  games.  Each  player  wishes  to 
find  a strategy  that  allows  him  to  make  his  choices  (choose  his  con- 
trol) so  as  to  optimize  the  payoff.  In  general,  these  strategies  are 
functions  of  the  payoff.  These  strategies  may  be:  1)  pure  — given  a 
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fully  specified  payoff  function,  there  is  a single  value  of  the  control 
which  should  be  chosen;  or  2)  randomized  (mixed)  — given  a fully 
specified  payoff  function,  there  is  a probability  density  function  which 
should  be  chosen  with  the  actual  control  found  by  a chance  device 
having  an  output  governed  by  the  optimal  probability  density. 

It  can  be  shown  that  every  matrix  game,  and  some  types  of 
continuous  games,  admits  of  a pair  of  strategies,  pure  or  mixed, 
such  that,  if  the  minimizing  player  uses  his  optimal  strategy,  the 
pavoff  to  the  maximizing  player  will  be  no  greater  than  a certain 
amount.  Conversely,  if  the  maximizing  player  uses  his  optimal 
strategy,  he  is  assured  of  receiving  at  least  the  same  amount.  This 
amount  is  known  as  the  Value  of  the  game.  The  condition  that 
expresses  the  inequalities  of  the  payoff  is  known  as  the  saddle  point 
condition.  Control  strategies  that  satisfy  the  saddle  point  condition 
are  such  that  both  players  can  compute  them,  or,  equivalently,  that 
both  players  may  announce  their  strategies  and  still  be  assured  that 
the  payoff  will  be  no  worse,  from  each  point  of  view,  than  the  Value. 

An  important  concept  in  game  theory  is  information.  In  games 
of  perfect  information,  each  player  knows  the  exact  value  of  the  pay- 
off and  all  that  has  occurred  in  the  past.  Such  games  as  chess  and 
checkers  are  examples  of  games  of  perfect  information.  It  can  be 
shown  that  such  games  have  saddle  points  for  pure  strategies  for  both 
sides.  Games  of  imperfect  information  have  some  elements  that  are 
unknown  or  given  by  a probability  distribution;  bridge  is  such  a game. 
Games  of  imperfect  information  may  or  may  not  have  pure  strategies 
that  satisfy  the  saddle  point  condition. 
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3. 


DIFFERENTIAL  GAMES 


Differential  games  involve  a payoff  which  is  in  some  way 


related  to  a dynamical  syst  *m.  The  two  players  attempt  to  optimize 
this  payoff  by  choosing  game  optimal  control  strategies.  The  evolu- 
tion of  the  state,  as  a function  of  the  players'  control  variables,  is 
described  by  differential  equations  (continuous  time)  or  difference 
equations  (discrete  time);  thus  the  name. 

Because  of  the  interaction  among  the  state  of  the  system,  the 
dynamical  relationship  between  the  state  and  the  controls,  and  the 
payoff,  it  is  necessary  that  the  game  optimal  control  strategies  be 
feedback  strategies,  ones  that  use  information  on  the  current  state. 
Thus  the  central  problem  for  the  engineer  is  to  find  these  feedback 
strategies.  (Mathematicians  must  still  deal  with  the  many  unsolved 
problems  concerning  the  existence,  uniqueness,  and  optimality  of 
solutions  .) 

In  general,  there  is  no  reason  to  assume  that  a given  differ- 
ential game  has  a Value  or  that  its  Value,  when  it  exists,  is  obtained 
by  using  pure  strategies.  There  are  problems  arising  from  the  fact 
that  the  payoff  is  generally  a continuous  functional  of  the  controls  and, 
as  noted  in  Section  II,  not  all  continuous  payoffs  admit  of  a Value,  or, 
if  they  do,  a Value  resulting  from  pure  strategies.  Happily,  there 
are  cases  where  the  structure  of  the  payoff  and  the  dynamical  equa- 
tions do  result  in  a Value  obtained  by  using  pure  strategies.  ^ 

Differential  games  can  be  divided  into  classes:  those  where 
observations  of  the  state  are  perfect,  and  those  where  they  are  not. 


they  have  involved  linear  observations  of  the  state  corrupted  by  addi- 
tive noise.  The  addition  of  such  noise  means  that  a deterministic 
payoff  function  is  no  longer  meaningful;  instead,  it  is  replaced  with 
an  expected  value  which  is  then  optimized. 

It  must  be  noted  that  the  descriptions  of  the  theory  of  games 
and  of  differential  games  are  included  mainly  for  orientation  purposes 
they  are  neither  rigorous  nor  complete.  Full  expositions  are  to  be 
found  among  the  references  already  cited. 
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SECTION  II 

MULTISTAGE  DIFFERENTIAL  GAMES  WITH  PERFECT 
INFORMATION 


1.  INTRODUCTION 

Many  papers  published  in  the  last  15  years  have  dealt  with 
differential  games  involving  perfect  information.  Almost  without 
exception,  they  have  dealt  with  problems  leading  to  pure  strategies. 
Some  of  the  exceptions  are  noted  in  References  25  through  27;  in 
addition,  Chapter  12  of  Reference  5 discusses  the  subject. 

In  a sense,  this  is  a very  surprising  turn  of  events  since  the 
whole  theory  of  differential  games  is  based  on  the  theory  of  games  — 
a discipline  which  is  more  concerned  with  randomized  strategies  than 
with  pure  ones.  Further,  until  quite  recently,  the  limited  work  done 
in  the  field  of  differential  games  had  been  performed  by  mathemati- 
cians (as  opposed  to  engineers)  who  might  have  been  thought  to  be 
more  interested  in  questions  of  randomized  strategies. 

While  engineers  involved  in  doing  research  are  quick  to  use 

some  of  the  better  known  mathematical  results  in  differential  games 

2 8 

involving  pure  strategies,  there  seems  to  be  no  such  inclination 

with  regard  to  the  theory  of  sequentially  compounded  two-person 
21,23 

games.  ’ Thus  the  available  work  in  stochastic  and  recursive 
games  has  not  received  its  due  attention. 

The  work  in  this  chapter  is  directed  toward  the  complete 
solution  of  a multistage  linear  differential  game  with  a quadratic 
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payoff  function.  For  convenience,  only  the  scalar  case  is  considered, 
but,  at  the  cost  of  more  work,  the  results  go  over  identically  when 
the  dynamics  are  given  in  terms  oi  matrix  equations. 

The  solution  is  complete  ih  that  both  deterministic  and  ran- 
domized control  strategies,  as  required  by  the  relative  value  cf  sys- 
tem parameters,  are  derived.  To  solve  this  problem,  it  is  assumed 
that  both  players  know  the  system  parameters  and  are  able  to  observe 
the  true  state  of  the  system  at  all  times. 

2.  DERIVATION  OF  PURE  CONTROL  STRATEGIES  WHEN 
CONTROL  MAGNITUDES  ARE  UNBOUNDED 

The  evolution  of  the  state  of  the  system  is  determined  by  a 
linear  difference  equation 


z . . 

= k.z.  4 a.u.  4 b.v. 

(1) 

1-1 

ii  ii  ii 

z . = 

1 

state  at  the  i**1  stage 

(2) 

u . = 

1 

minimizing  player's  control  at  the  i*^1  stage 

(3) 

v . = 

1 

maximizing  player's  control  at  the  i^1  stage 

(4) 

Both  u(  and  v.  can  be  functions  of  any  or  all  of  the  past  and  present 
values  of  the  state.  (In  game  theory,  the  maximizing  player  is 
generally  referred  to  as  player  I and  the  minimizing  player  as 
player  II;  these  designations  also  are  occasionally  used  here.) 

The  subscript  indicates  the  stage  number.  Equation  (1)  is 
written  in  terms  of  time  (stages ) -to -go  rather  than  the  usual  time 
measured  from  the  initiation  of  the  differential  game.  This  is  done 
because  the  stage  number  represents  the  number  of  times  each 
player  must  choose  a control  - the  actual  value  for  u(  or  w . Thus  an 
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N stage  game,  which  begins  in  state  z^  and  terminates  in  state  zq, 
requires  N choices  of  each  player's  control. 

The  object  of  the  differential  game  is  to  optimize,  in  a game 
sense,  a quadratic  payoff  function 


IN(U>  v) 


X 


c . z 

1 


2 

i-1 


2 

+ d.u 

l l 


2 

e.  v. 
l i 


(5) 


That  is,  the  minimizing  player  wishes  to  choose  u^,  • • • , u^  so  as  to 

minimize  1^ , while  the  maximizing  player  wishes  to  pick  , • • • , Vj 
so  as  to  maximize  it.  Because  there  is  only  a single  payoff  functional, 
the  problem  falls  within  the  general  purview  of  zero  sum  game  theory. 

The  Value  of  the  game  is  given  by 


J 


N 


val  val 

U,  V JN(u,  v)  - U,  V 


X 

i = 1 


2 A . 2 

c.z.  + d.u. 
li-l  li 


2 


e.v 
i l 


(6) 


where  U and  V represent  the  set  of  2N  controls  u^j,  • • • , Uj  , v^,  • • • , 
Vj  . The  parameters  of  the  system,  a^,  b.,  c.,  d.,  e.,  and  k.,  are  all 
assumed  to  be  real;  a.,  b.,  d.,  e^,  and  Cj  are  assumed  to  be  positive; 
c^(i.^  1)  is  assumed  to  be  nonnegative;  and  k^  can  be  positive,  negative, 
or  zero.  These  restrictions  are  required  to  produce  meaningful 
results.  If  a.  or  b.  is  zero,  the  associated  control  can  have  no 
effect  on  the  state  at  the  next  stage.  Similarly,  if  d.  or  e.  is  zero, 
there  is  no  penalty  associated  with  the  use  of  a control,  and,  if  other- 
wise called  for,  infinite  control  magnitudes  could  be  used.  If  c^  were 
zero,  the  effect  would  be  to  terminate  the  game  a stage  early  since 
neither  player  would  wish  to  incur  a penalty  in  the  p-«yoff  by  using  his 
control  when  there  was  no  attendant  change  due  to  a change  in  state. 
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On  the  other  hand,  if  c.  = 0(i  + 1),  the  game  is  still  well  defined  as  to 
the  number  of  stages,  and  the  control  at  that  stage  is  not,  in  general, 
zero  for  either  player.  (This  is  the  analog  to  terminal  control  problems 


in  optimal  control.)  The  controls  u.  and  v.  are  also  assumed  to  be 
real  with  no  restrictions  on  their  sign  or  magnitude. 

The  principle  of  optimality  of  the  theory  of  dynamic  program- 
ming, 29  along  with  simple  variational  principles,  is  used  to  find  the 
desired  control  strategies.  30  (Control  strategies  are  defined  to  be 
rules  that  tell  each  player  the  value  he  should  choose  for  his  control 
at  each  stage  based  on  the  information  available  to  him.  The  control 
is  defined  to  be  the  value  actually  chosen.  ) 

As  is  usual  in  dynamic  programming,  the  single-stage  game  is 
first  solved.  To  do  this,  assume  that  pure  strategies  exist  for  both 
players,  strategies  that  permit  each  player  to  use  the  actual  value  of 
the  state.  Denoting  these  game  optimal  strategies  by  overbars,  it 
follows  that  the  Value  of  the  single-stage  game.Jj,  is  achieved  when 

ut  = ujtzj) 


V1  = V^Zj) 


(8) 


The  saddle  point  condition  states  that,  if  one  player  uses  his 
game  optimal  strategy  and  the  other  does  not,  then  the  payoff  is  as 
good  or  better  than  that  which  would  have  been  achieved  if  both  playe  r s 
had  used  their  game  optimal  strategies.  In  terms  of  the  payoff  func- 
tional, (5), 

Vv  V aIi<v  V = Ji  2Ii(v  V 


where  Uj  and  Vj  are  any  real  functions  of  z^.  Putting  it  another  way, 


o 


1 


If 


!1 («1  . V1 ) - Ij (Uj  , Vj ) i 0 
!1 (“l  • Vj ) - I, (Uj  , Vj ) < 0 

Uj  = Uj  + e6(zj) 


(10) 

(ID 

(12) 


where  € is  a small  number  and  6(Zj)  iB  any  real  function  ofzj  , then 
(1).  (5).  (10).  and  (12)  lead  to 


X1  (U1  + e6  1 V1 } * Xi (u!  * v! ) = ci  <k!  + aiu!  + aj  e6  + bj  )2 


♦ dj  (vlj  + c6  )2  - ej  v2 


ci(ki*i  +aiui  +Vi  )2 


, -2  -2 
dlul  + elvl 


2 [a1c1k1z1  + (a12c1+d1)u1 

,b,c,vi]e6  + (a2  td^cViO 


+ a 


(13) 


where  the  argument  of  6 has  been  dropped  for  brevity.  Standard 
variational  arguments  lead  to  the  following  necessary  conditions  for 
(13)  to  hold 


aicikizi + (aiN+di>ui + aibicivi  = ° 

(14) 

2 

al  C1  + dl  2 ® 

(15) 
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Heflin 

.. . . 





A similar  approach  leads  to 

(Uj , vt  +•  cL)  - Ij  (ifj  . Vj ) = 2 [bjCjkjZj  + a^c^ 


- ( e j -b12c1)v1]eA.(e1-bl2c1)^  A2*0 

(16) 


so  that  necessary  conditions  for  (16)  to  hold  are 


ViVi  + * 


-('i-bfc.)7i  *• 


(17) 


*1  -blCl  ?° 


(18) 


The  simultaneous  solution  of  (14)  and  (17)  results  in  the  game 
optimal  control  strategies  for  the  single-stage  game 


(»lS  + dl)  (*1  - bl2  C1  ) + al2bfcl 


alClClkl 


|alS  * 


(19) 


v.  = 


Vidiki 


(al2cl  +dl)(el  -bf cl) 


bl  C1 


x 2W2  2 

+ ai  bl  'l 


(20) 


Substitution  of  (1),  (19),  and  (20)  into  (6)  yields 


J.  = 


Cldlelkl 


(a2ct  +di)(e,  -bfc,) 


2,2  2 
+ al  bl  C1 


(21) 


since  Uj  and  are  precisely  the  pure  control  strategies  which 
optimize,  in  a game  theory  sense,  the  payoff  functional,  (5). 
Rewriting  the  denominator  of  (21)  as 


(a2c,  td^je,  -b2Cl)  ♦ a2b2c2  --  c,  (afe,  -b2d,)  ♦ d^, 


(22) 
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indicates  that  the  a sufficient  condition  for  pure  strategies  to  exist  is 

*!•,  - 2 0 <23> 
(It  should  be  clear  that  (21 ) will  always  be  nonnegative  since  the  maxi- 
mizing player  can  always  choose  his  control  to  be  zero  if  it  becomes 
apparent  that  other  values  would  lead  to  a negative  payoff.)  When  the 
equality  in  (23)  holds,  it  is  easy  to  show,  using  (19).  (20),  and  (1), 
that 

z = k,  z (24) 

o 11 

so  that  the  change  of  state  is  not  affected  when  both  players  use  their 
game  optimal  strategies.  This  might  be  called  a case  of  equal 
efficiency . 

Having  found  the  game  optimal  pure  strategies  for  a single- 
stage  game,  it  is  now  possible  to  solve  the  two-stage  game.  Applying 
the  principle  of  optimality  yields 


val 

U2'V2 


,C  2Z1 


, 2 
d2U2 


e2v2 


Substituting  (21)  into  (25)  leads  to 


(25) 


val 

J2  = u2’  V2 


~ 2 , 2 
C2Z1  + d2u2 


e2V2 


where 


c2 


Cleldlkl 


Kci + di)  (ei  -bi2  ci) 


2.2  2 

al  bl  C1 


(26) 


(27) 
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The  uae  of  (27)  has  changed  the  two- stage  problem  into  a single-stage 
problem  so  that  all  that  has  gone  before  applies  now  with  the  subscript 


2 replacing  1 and  instead  of  c^.  Thus 


U2  = 


fa2  C2  + d2 ) (e2  ‘ b2  C2 ) + a2  bZ  *2 


V2  = / 2 ~ 


b2  c2  d2  k2 
2 ~ 


2,2-2  2 


(a2“c2  + d2)  (e2  - b2  c2)  4a2  b2-c2 


J2  / 2 ~ 


C2  d2  e2  k2 


(a2Z  c2  + d2)  (e2  * 2 


with  sufficient  conditions  being 


a2  c2  + d2  > 0 


e2  ‘ b22  ^2  > 0 


(28) 


(29) 


(30) 


(31) 


(32) 


(The  other  necessary  conditions  involving  the  first  variation  are 
inherent  in  (28)  and  (29).) 


The  general  solution  for  any  number  of  stages  is  quite  clear. 
For  i = 1,  ...»  N 


(35) 


c.  = c.  + A.  . ; A =0 

i 1 l-l  o 


— *i  aj  3 J. 

Ui  d.  k,  zi  2 d.  k.  S z. 
i i ill 


b.  b.  ?>  J. 

— _ i » i l 

v.  — rr  A-  z ■ - 5 — -t"  ^ — 

l e.  k.  ii  2 e.  k.  d z. 

ii  ill 


(36) 


(37) 


(The  fact  that  the  game  optimal  controls  are  specifically  related  to  the 
Value  of  the  game  is  no  coincidence;  the  same  behavior  is  exhibited  in 
optimal  control  problems  involving  linear  dynamics  and  quadratic 
cost  functionals . ) ^ 

These  results  are,  of  course,  not  new.  A slightly  different 
approach  was  used  in  Reference  32.  In  the  case  of  continuous,  instead 
of  discrete,  dynamics,  the  analogous  result  was  obtained  in  Reference  12 
using  straightforward  variational  arguments,  and,  in  Reference  33, 
functional  analysis  techniques  were  used. 

3.  RANDOMIZED  CONTROL  STRATEGIES 

Prior  to  this  section  and  in  the  following  chapters,  only  the 
pure  strategy  aspects  of  game  theory  are  used  to  derive  control 
strategies.  In  this  section,  a wider  (but  still  very  limited)  appeal  is 

made  to  other  aspects.  In  particular,  the  following  derivation  is  based 

. . ..  20,  34 

on  the  theory  of  infinite  convex  games. 

Roughly  speaking,  an  infinite  convex  game  is  one  where  each 
player  is  free  to  choose  his  control  from  a region  of  an  appropriately 
(finite)  dimensioned.  Euclidean  space.  The  game  is  called  infinite 
because  of  the  infinite  number  of  possible  choices  of  a control  within 
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the  region  (as  opposed  to  a finite  number  of  possible  choices  in  a 
matrix  game).  The  game  is  called  convex  if  the  scalar  payoff  function 
is  convex  in  the  minimizing  player's  control  variable. 

It  has  been  shown  that,  when  the  payoff  is  both  convex  in  the 
minimizing  control  variable  and  continuous  in  both  control  variables, 
the  game  has  a Value.  Further,  if  the  minimizing  control  must  take 
its  value  from  a compact  and  convex  region  of  the  control  space,  then 
the  minimizing  player  has  a pure  strategy.  Finally,  if  the  minimizing 
player  must  choose  his  control  from  an  arbitrary  n dimensional 
region,  then  the  optimal  control  strategy  of  the  maximizing  player  will 
require  randomization  over,  at  most,  n+ 1 points. 

The  theory  of  infinite  convex  games  also  describes  how  to  find 
solutions  (control  strategies)  for  each  player.  The  application  of  this 
theory  to  the  scalar  case  is  carried  out  in  the  following  work  as  an 
illustration.  The  pertinent  theor ems,  suitably  paraphrased  in  terms  of 
notation,  are  taken  from  Chapter  12  of  Reference  20  and  given,  without 
proof,  in  the  Appendix. 

Both  payoff,  (5),  and  the  dynamics  of  the  system,  (1),  remain 

unchanged.  In  this  section,  however,  it  is  assumed  that  both  u and 

i 

v.  are  limited,  in  absolute  value,  to  be  less  than  or  equal  to  one.  That 
is,  both  u.  and  v.  belong  to  sets  U and  V such  that 

U = |ui  5 luil  * 1 | (38) 


(39) 


(Previously,  it  was  assumed  that  u.  and  v.  could  take  on  any  value.) 
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As  before,  the  principle  of  optimality  is  used  to  derive  the 
control  strategies  for  all  N stages.  Before,  at  each  stage,  the  Value 
was  found  from  the  (implicit)  fact  that 


min  max 


t - r rr.  T.  \ , . . max  min 

i i^i*  0 " ui*  vi  Vup  vi>  = ui  vj  Ii^uj»vi)=  v.  ui  Ii(ui'vi 

t , u . (40> 

general,  that  is,  when  pure  strategies  do  not  exist  for  both  players, 

this  is  not  true.  It  is  true,  however,  that  for  the  game  under  consid- 
eration. 


min  max 


J.  = u.  v . I.  (u.,  v . ) 

i i i l i’  i'  (41 

as  given  in  the  theorem  in  the  Appendix.  Note  that  the  order  of  the 

min-max  operations  is  strictly  a result  of  the  convexity  of  the  payoff 

m Ui  Whlch  lmPIies  a pure  strategy  for  player  II.  Having  found  J u 
, i’  i 

is  found  from  the  solution  of 

max 

J.  - v.  I.  (u.,  v.) 

i i i*  i*  i#  (42) 

Equations  (41)  and  (42)  provide  the  tools  needed  to  solve  the  problem 


at  hand. 


As  usual,  the  single-stage  game  is  solved  first.  Since 


■(‘ici  +di) 


(44) 


— ...  « ' 


? J 


2 =-2(®  1 - bf  cl) 


A pure  strategy  u.  for  player  II  exists  whenever 


a 


2 

1 


+ dj  > 0 


(45) 


Player  I's  strategy  then  calls  for  randomization  over,  at  most,  two 
points  (randomization  over  a single  point  is  the  same  thing  as  a pure 
strategy).  Completely  analogous  results  are  obtained  for  a payoff  that 
is  concave  in  the  maximizing  player's  control  variable,  i.e.,  when 


A pure  strategy  exists  for  player  I,  and  player  II's  optimal  strategy 
requires  randomization  over,  at  most,  two  points. 

The  rest  of  this  chapter  is  concerned  with  the  case  where  the 
payoff  is  strictly  convex  in  u.  and  where  it  may  or  may  not  be  concave 
in  v.  .(The  last  section  dealt  with  a payoff  which  was  convex  in  u.  — a 
pure  strategy  for  player  II  — and  which  was  concave  in  v.  — a pure 
strategy  for  player  I.) 

Making  use  of  ( 1 ) and  (5),  it  follows  that 


p-j,  * p - 1 *j  -t  ajiij  f DjV,)2  +dj  U]2  . e,  V]2 


I)(tii»V|)  = c , (k  , z , 4 a jU  , 4 b 

2 


- A 4 Bv  4 Cv 


4 A 


B2 
‘ 40 


(4  7) 


(1 

. 


17 


where 


A = CjtkjZj  + ajUj)2  + dju2 


B = ^bj  c j (k  L z j 4 a^j) 


C =Clbl  - el 


(48) 

(49) 

(50) 


The  payoff,  as  a function  of  Vj , can  take  on  either  the  shape 
of  a straight  line  (c  = 0)  or  a parabola  (c  ^ 0);  if  c > 0,  then  the 
parabola  opens  upward;  if  c < 0,  it  opens  downward.  If  it  opens  down 
ward,  then  the  payoff  is  concave  in  v^,  and  so  a pure  strategy  exists 
for  player  I. 

The  parabola  is  symmetric  about  the  line 


_ B 

V'27 


Vl(Vl  + a j Uj ] 


~rz — 

bl  C1 


- e , 


(51) 


and,  of  course,  the  maximum  (c  < 0)  or  minimum  (c  > 0)  value  of 
*1^U1*  Vl)  occurs  at  the  same  value  of  Vj.  It  is  important  to  know  where 
the  maximum  or  minimum  value  occurs  as  a function  of  and  z . 

4.  DERIVATION  OF  PURE  CONTROL  STRATEGIES  WHERE 
CONTROL  MAGNITUDES  ARE  BOUNDED 

It  is  instructive  to  see  what  happens  to  pure  strategies  when 
the  magnitudes  of  u.  and  v.  are  constrained  to  belong  to  U and  V, 
respectively,  instead  of  taking  on  any  value.  The  major  result  is, 
naturally,  to  complicate  the  form  of  the  solution  since  the  control 
strategies  are  no  longer  linear  functions  of  the  state.  The  theorems 
given  in  the  Appendix  are  used  to  find  the  solution. 
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It  has  already  been  noted  that  pure  strategies  exist  for  both 
players  whenever  (45)  and  (46)  hold  (for  this  section  the  same  assump- 
tion holds).  However,  because  of  the  constraints  on  the  magnitude  of 
Uj  and  Vj,  it  is  not  possible  to  simply  apply  the  techniques  of  Section  II 
to  this  problem.  Instead^  the  optimal  strategies  for  the  single- staf, . 
game  are  found  by: 

max 

1)  Finding  v j(Uj)  such  that  Ij(Uj,  Vj(Uj)  ) = Vj  I^Uj.Vj) 

min 

2)  Finding  Uj  such  that  u ^ Ij(Uj,v.(uj))=Jj 

(The  technique  is  the  same  whether  the  control  strategies  are  pure  or 
randomized.)  Both  and  v^,  the  optimal  strategies,  are  functions 
of  the  state  z j . 

From  (51)  it  follows  that  Vj  is  equal  to  minus  one  whenever 


C ! (k  1 Z ! + 


w 


bl  C1 


1 


(52) 


or  whenever 


. bl  C1  ' el  Vl 

U1  * a“K,  c"',"""  ' 


1 1 1 


1 


Also  Vj  is  equal  to  plus  one  whenever 


blCl(V'l  + a j u j ) 

— rz 

b,  Cj  - e, 


s 1 


(53) 


(54) 


19 


which  occurs  when 


2 - 


,2 

bl  C1 


klzl 


alblcl 


(55) 


When  neither  (53)  nor  (55)  hold,  v ^ is  given  by  (51)  directly.  Thus 
v j(Uj)  is  given  by 


-1 


; - 1 £ Uj  £ 


. 2 

bl  C1  - el 
" alblCl 


klZl 


, , , bjCjtkjZj+ajUj)  bfcj-ej  kjZ^  ^ b f c 1 ~ e 1 kizi 

V1  Ul^  \ T2  — ’ a , b , c , a,  U1  a.b.c,  ' a, 


bl  Crel 


111 


1 


111 


1 


+ 1 


b f C 1 " e 1 Vl 


alblCl 


1 


^ Uj  ^ 1 


(56) 


It  may  happen  that  some  of  the  sets  of  inequalities  are  incompatible  in 
(56).  For  instance,  it  could  be  that 


lblCl 


klZl 


(57) 


As  an  example,  this  could  occur  for  k^z^  sufficiently  large  and  positive. 
The  meaning  of  such  an  incompatibility  is  merely  that  such  a value 
for  Vj(Uj)  is  never  an  optimal  choice,  since  either  u^,  from  the  middle 
inequality  of  (56), or  +1  would  be  chosen.  From  a (somewhat)  practical 
point  of  view,  replace  any  expression  in  the  inequalities  of  (56)  whose 
absolute  magnitude  is  greater  than  one  by  one  times  the  algebraic 


sign  of  the  expression.  Disregard  any  values  of  v^(Uj)  where  the  right 
side  of  an  expression  minus  the  left  side  (where  no  expression  is 
larger  than  one  in  absolute  magnitude)  is  zero. 

Defining 


- 1 


» 2 

bl  Cl~el 
alblCl 


klZl 


* - 1 


D = 


,brci-ei 

aiVi 


kizi 


i < 


.2 

bl  C1  ~e 
alblCl 


klzl 


< 1 (58) 


+ 1 


1 * 


biS 


el  klZl 


Wi 


-1 


Vi~ei  kizi 

aibici  ai 


s - 1 


b2C  -e  kz  b2c  -e  kz 

E = < - J—J i » — — - ; -1<--LJ X<1  (59) 

a.b.c,  a.  a , b , c , a, 


rri 


i 


rri  “i 


+i 


; l s - 


bi  yei  kizi 

aibici  ’ ai 


The  Value  of  the  single-stage  game  is  given  by 


Jj  = min 


Hu.<Dci(kizi+aiufbi)  +dlul  'el  ; 


|-‘^U1 


min  c iej(kjZj+ajUj) 


min 


EKUl<E  * 


2 2) 

T— 5 — + dlul"  > E<1  CjtkjZj+ajUj+bj)  +djUj  -ej 

1 C1  * el 


(60) 


1 


. 

) 
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Equation  (60)  is  precisely  the  implementation  of  (41).  For  any  value  of 
the  state,  Zj,  it  is  a straightforward  tasktofind  the  value  of  Uj  which 
minimizes  any  of  the  pertinent  expressions  in  (60).  The  least  of  the 
three  is  then  the  Value  of  the  game.  The  optimal  control  for  v ^ is 
found  from  (56).  Thus  the  single-stage  game  is  completely  solved. 

The  solution  to  the  two-stage  game  is  still  given  by  (25). 
Unfortunately,  in  the  general  case,  it  is  no  longer  possible  to  express 
the  control  strategies  and  the  Value  by  simple  expressions  similar  to 
(28),  (29),  and  (30).  This  comes  as  no  particular  surprise  since  it  is 
a result  of  the  nonlinearity  of  the  control  strategies  and  is  not  specif- 
ically related  to  differential  games.  The  same  behavior  is  exhibited 
when  dynamic  programming  is  used  to  solve  optimal  control  problems 
involving  constraints  on  the  controls. 

The  same  sort  of  a statement  can  be  made  concerning  the 
solution  to  the  N stage  game.  The  problems  encountered  are  those 
inherent  in  dynamic  programming;  no  more  theory  is  required. 
Accordingly,  nothing  more  will  be  said  about  the  case  where  the  pay- 
off is  concave  in  the  maximizing  control  variable. 

5-  DERIVATION  OF  RANDOMIZED  CONTROL  STRATEGIES 

This  section  deals  with  a case  where  the  payoff  function  is  not 
strictly  concave,  that  is,  it  is  assumed  that 

bl  C1  ' el  * 0 (61) 

(This  is  actually  equivalent  to  saying  that  the  payoff  is  convex  in  the 
maximizing  control  since  every  quadratic  expression  is  either  a con- 
cave or  a convex  function.  ) 
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It  follows  immediately  from  the  second  line  of  (47)  that  the 
maximum  of  I^Uj.Vj)  is  achieved  for 

v j = 88n  [b]  = sgn[b  j c j (k  j z j + ajUj)]  (62) 

where 


8(!n[x]  = ; x / 0 

and  is  to  be  defined  for  x = 0. 

Consider  first  what  happens  when 


(6  3) 


Vl 


> 1 


(64) 


When  (64)  holds,  it  is  impossible  for  any  choice  of  to  change  the 
sign  of  B;  it  must  remain  positive,  which,  by  (62),  means  that  the 
Value  is  given  by 


t min  ..  , .2  , 2 

Jj  = uL  c(k  j z j 4 a j u j + b j ) + d ^ u j -ej 


The  optimal  value  for  u.  is 


(65) 


Uj  = max 


W hen 


- 1. 


a 1 c 1 (k  i 7'  i + bj) 

aT7!  + dl 


(t’6) 


Vl 


-<  - 1 


(67) 


2.3 


then  the  same  reasoning  leads  to  the  conclusion  that 


min 


Jj  =Uj  CjlkjZj  + a i u j - bjJ^+djUj2  - e 


(68) 


where 


Uj  = min 


1.  - 


a i c j <kj  zj  - b,) 

2 

al  cl  + di 


(69) 


In  both  (66)  and  (69),  Uj  does  not  take  on  a value  on  the  bound- 
ary of  U only  if  the  weighting  on  the  square  of  the  control,  dj  , is 
sufficiently  large.  Combining  the  two  expressions  shows  that  Uj  is  an 


interior  point  of  U only  if 

dl  >alCl(,kl7'll  + bl  ) 


al  ci  >0 


(70) 


Even  though  the  minimizing  control  may  be  forced  to  assume  an 
interior  value  of  U (not  reducing  the  state  component  of  the  cost  as 
much  as  possible),  the  maximizing  player  need  never  worry  about  a 
similar  condition.  This  follows  from  the  fact  that  the  increase  in  the 
payoff  due  to  a change  in  the  state  is  greater  than  the  cost  incurred 
due  to  the  use  of  the  maximizing  control  — this  is  the  real  meaning  of 
the  payoff  not  being  strictly  concave  in  v^  . The  concrete  result  is 
that  player  I always  uses  the  largest  magnitude  of  the  control  available 
to  him  for  these  two  cases. 

Finally,  there  is  the  case  where 


klZl 


< 1 


(71) 


which  is  the  most  interesting  of  all,  since  it  can  lead  to  randomized 
control  strategies.  In  this  case,  sgnLkjzj  +ajujJ  may  be  equal  to 
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plus  or  minus  one  or  sgnfO],  depending  on  Uj  . Because  Vj  is  well 
defined  when  kjZj  + ajUj  is  not  equal  to  zero,  it  is  possible  to  write 


min 

Jj  = min  1<u  < . Zj  ci  <ki  zi  'aiui  -bj^  + djUj  - ej ; 
1 al 

.2 


k.  z.  max  2 

U,  = - — Vj  CjtbjVj)  + d, 


mm 

lli-lic  u,  < 1 c 1 ^1  Z1  * al“l  + V^Vf  - e 
al  1 


(72) 


Consider  the  middle  term  first.  Since  bj  Cj  - ej  is  non- 
negative , 

v2  ..  >2 


max  2 

vi  <Vi  - pi>  vi  + di 


f?) 


(73) 


so  that  the  maximum  of  Vj  (within  V,  of  course)  is  achieved  for  v^ 

equal  to  plus  or  minus  one.  Since  these  are  precisely  the  values  used 

for  Vj  in  the  first  and  third  terms  of  (72),  the  two  half  open  intervals 

for  Uj  can  be  replaced  by  closed  intervals,  and  the  second  term  can 

be  discarded  so  that 

, min 


J1  = min  i - 1 - u < - — cl(klZl  4 alUl  - bi  >Z  + diui  - el; 
( 1 al 


min 

kllUu,<l  Cl(klZ1  tajUj+bj^  + djUj2 
al  1 


(74) 


I 
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The  result  of  (73)  is  effectively  to  define 

| sgn[0]|  = 1 (75) 

although  saying  nothing  about  how  an  algebraic  sign  is  to  be  attached. 

The  first  term  in  (74)  has  a minimum,  as  a function  of  u.  , 

kl  Z1 

that  need  not  lie  between  minus  one  and . Using  ordinary 


calculus,  the  minimizing  u^  is  found  to  be 


aiVkizi  - V 

al  C1  + dl 


(76) 


which  means  that  the  absolute  minimum  of  the  term  lies  within  the 

klZl 

- 1 , - — | when 


(?) 


half-open  interval 

alCl(klZl  ‘ b>  ] 


• is. 


!'<  _ Vl 


(77) 


al  C1  + dl 


The  left-hand  inequality  always  holds  since  it  can  be  rewritten  as 


- 1 


klZl 


(78) 


where  the  first  term  on  the  right  of  (78)  is  always  equal  to  or  greater 
than  minus  one  and  the  second  term  on  the  right  is  always  positive, 
while  the  second  term  on  the  left  is  always  negative. 

Simple  algebra  shows  that  the  right-hand  inequality  holds  when- 
ever kjZj  is  positive  and 


a b c 

d.  < -4-  1 s 0 
1 Vi 

' ai 


(79) 
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Since  dj  is  assumed  to  be  positive,  (79)  can  never  hold  if  kjZ^  > 0 

and,  for  this  case,  the  absolute  minimum  occurs  for  a value  of  u, 
k z 1 

k j z i 

greater  than  - . If  k.  z is  negative  and 

al  1 1 

>aibi'i 

dl  n TTT 


T~1 


(80) 


then  the  absolute  minimum  occurs  for  a value  of  u less  than  or 
klZl 

equal  to  - . 

al 

Investigation  of  the  second  term  of  (74)  shows  that  the  absolute 

minimum  is  achieved  for  values  of  u.  less  than  one  and  falls  within 
/ k.z,  1 1 

the  interval 


/ 11 
r ai  ’ 


whenever  kjZ^  is  positive  and 


d.  > 


a.bici 


1 kizi 


(81) 


The  absolute  minimum  occurs  for 


U1  = 


alCl (klZl  + bl J 

aiS  + di 


(82) 


so  that  (76),  (82),  (80),  and  (81  ) can  be  summed  up  by  saying  \ at  if 


d>  > 


alblCl 

klZl 


(83) 


then 


llCl  (kl  Z1  + bl  s8ntki  zi  ]) 


U1  = - 


a,  c,  + dj 


(84) 


1 

( 
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If  (83)  holds, then  Vj  is  given  by 

v^sgntkjZj]  (85 

The  game  optimal  minimizing  control  resulting  from  (84)  means  that 
^1*1  + alUl  ^as  8ame  sign  as  kjZj. 

The  Value  of  the  game  is  then  given  by 


J1  = Cldl 


( 1 ki  xi  i + b,r 


a 1 C 1 + d, 


It  (83)  does  not  hold,  then  the  absolute  minimum  of  each 
expression  tails  outside  the  half  open  intervals  previously  defined. 
The  minimum  is  then  achieved  for 


klZl 

v~ 


with  a corresponding  Value  of 


Jj  b,  c,  + d, 


(¥*)- 


Equation  (75)  already  established  that  Vj  will  take  on  the  values 
plus  or  minus  one;  having  this  knowledge,  Theorem  2 of  the  Appendix 
can  be  used  to  find  the  optimal  mixed  strategy  for  the  maximizing 
player.  Using  Theorem  2,  it  is  seen  that  the  optimal  mixed  strategy 
involves  randomizing  over  the  two  points 


v,2  = ♦ 1 
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with  probabilities  a and  (1  - a),  respectively,  where  a and  1 - a are 
nonnegative.  The  optimal  strategy  is  then  given  by  a probability 
distribution: 


vj  . aHjvj1)  + (l  -a)  h(v2) 


(91) 


where  H(x)  means  the  step  function  with  the  jump  occurring  at  x.  An 
expression  for  a is  found  from  the  requirement  that 


du. 


ai 


Vvi' 


= o 


ui  ’ viZ 


(92) 


and 


du. 


* 0 


Vvi 


<^U 


1 ! 


11  . V *“ 

1 ’ 1 


Sine  e 


dl. 


2 a c ( k /.  + a . u , + b v . ) + 2d  u 


duj  ““m"l  ' " 1 “ 1 ' “\'l 


1 \ 


iii 

du, 


dl 

du 


= - ^ a, bj  c,  - 2 dj 


(¥M 


Vvi‘ 


= 2 a)bici  - 2d, 


VVf 


(^) 


(91) 


(94) 


(95 ) 


( 9f> ) 


(9?) 
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It  follows  that  (96)  must  be  nonpositive  and  (97)  nonnegative.  But 
those  are  precisely  the  requirement  imposed  by  (83).  Accordingly,  a 
can  be  found  from  the  solution  of 


- a 


aiVi 


+ (1  -a) 


blcl 


so  that 


0 


(98) 


a 


aiVt  -di 


iaiVi 


(99) 


and  the  game  optimal  maximizing  strategy,  (91 ),  is 


with  probability 


with  probability 


aiVi 


2aiVi 


aiVi 


2aibici 


(100) 


The  game  optimal  control  strategies  and  the  Value  for  the 
single-stage  game,  as  a function  of  the  state  and  the  various  system 
parameters,  are  summarized  in  Table  1. 

Table  1 illustrates  the  nonlinear  nature  of  the  strategies  as 
functions  of  both  the  state  and  the  relative  values  of  the  system 
parameters.  And,  for  the  first  time,  a situation  exists  where  a 
randomized  strategy  is  optimal. 
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Notice  that  ej  does  not  appear  except  in  the  Value.  This  is  to 
be  expected  since  | Vj  | always  is  equal  to  one.  Accordingly,  the 
penalty  incurred  through  the  use  of  Vj  is  always  the  same. 

As  noted  in  the  last  section,  the  nonlinear  nature  of  the  game 
optimal  control  strategies  makes  it  impossible  to  solve  the  multi- 
stage differential  game  explicitly  in  a few  statements.  The  general 
solution  is  still  given  by  functional  equations  of  the  form  of  (25)  using 
the  techniques  outlined  in  this  chapter. 


6.  EXAMPLE 

This  section  illustrates  the  solution  to  a multistage  differential 
game  having  randomized  strategies.  To  do  this,  it  is  assumed  that: 

1)  The  system  parameters  a,  b,  c,  d,  e,  and  k are  constant 

2)  The  system  parameters  a,  b,  and  e are  positive 

l kz. 


3)  The  initial  state,  z.r,  is  such  that 

N 


N 


< 1 


4)  The  control  magnitudes  are  such  that  a ^ b 

5)  The  weighting  on  the  use  of  u.  is  small  enough  so  that 


d < t . where  c is  to  be  defined 


N| 


Table  1 supplies  the  solutions  to  the  single-stage  game. 
Substituting  Jj  into  (25)  and  replacing  "val"  by  "min  max"  yield 

.2  2 J kzl  f 
b c + d (— ) . e 


J2  = u2 


min  max 
v~ 


2 , , 2 2 
cz\  + du2  ‘ ev2  + 


min  max 


(■♦£) 


2 2 2 2 
z,  + du^  - ev^  + b c - e 


T 


(101) 
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Letting 


c = c + 


(102) 


and  substituting  (1)  and  (102)  into  (101)  leads  to 


min  max  r-~ 


J2  = U2  V2 


c (kz 

Ct 


Z 2 2 2 

+ au^  + b v’2 ) + du^  - e v^  + bc-e  (103) 


which  has  the  same  form,  except  for  a constant  term,  as  (47).  In 
other  words,  the  same  techniques  used  to  solve  the  single-stage  game 
will  suffice  to  solve  stage  2 of  a two-stage  game. 

Since  the  constant  term  affects  only  the  magnitude  of  the  Value 
(and  not  the  control  strategies),  it  follows  immediately  that 


u,  

2 a 


ah'?  - d 


- 1 with  probability 


2 abc 


abc  - d 


with  probability 


2 abc 


and  that  the  Value  for  the  two-stage  game  is  given  by 


J2  = 2b2c  - 2e  ♦ d * d (^) 


Repeated  application  of  (25)  leads  to 


JN  = d ( ~T^)  + N(cb2  - e)  + (N  - 1 ) d j 


(104) 


(105) 


(10o) 


(107) 


with  the  game  optimal  control  strategies  being  given  by 


kz 


N 


N a 


abc  - d(-^) 

1 with  probability  2~ab^ 


'N 


— [yi 


1 with  probability  2 ab~ 


(108) 


(109) 


The  Value  of  the  game  is  a function  only  of  the  initial  state, 
and  the  number  of  stages,  as  it  should  be.  It  is  not  a random 
variable.  The  same  is  not  true  for  the  trajectory.  It  describes  a 
stochastic  process  taking  on  the  values  plus  or  minus  b at  each 
stage.  Thus  there  are  2N  possible  paths  for  the  system  to  follow.  It 
should  be  clear  now  why  assumption  4 is  required:  without  it  the  state 
at  stage  N - 1 would  be  such  that  the  randomized  control  strategy 
would  no  longer  be  necessary.  (This  is  the  cas e , with  d and  e equal 
to  zero,  covered  in  Chapter  III  of  Reference  35  .) 

VII.  CONCLUDING  COMMENTS 

This  chapter  has  dealt  with  the  solution  to  multistage  scalar 
games  with  linear  dynamics  and  quadratic  payoff  functions.  This  was 
done  for  convenience  rather  than  out  of  necessity,  particularly  in  the 
case  of  randomized  strategies.  As  Reference  34  indicates,  the  solu- 
tion of  convex  games  is,  in  theory,  identical  for  finite  dimensional 


34 


systems;  practically  speaking,  the  added  notational  complexity  would 
only  obscure  an  already  diffuse  solution. 

Results,  completely  analogous  to  those  obtained  in  this 
chapter,  are  obtained  under  the  assumption  that  the  payoff  is  strictly 
concave  in  the  maximizing  control  variable. 


SECTION  III 


MULTISTAGE  STOCHASTIC  DIFFERENTIAL  GAMES 

1.  INTRODUCTION 

This  chapter  deals  with  the  multistage,  discrete  time  stochas- 
tic differential  game.  The  dynamics  are  described  by  a linear  differ- 
ence equation  having  time  varying  deterministic  coefficients,  with  the 
possible  exception  of  a noisy  forcing  function.  Both  players  wish  to 
choose  controls  so  as  to  either  maximize  (player  I)  or  minimize 
(player  11)  the  expected  value  of  a quadratic  cost  functional.  Neither 
player  can  observe  the  actual  state  of  the  system;  instead,  each  player 
has  an  observation  of  the  state  which  is  corrupted  by  additive  noise. 

The  purpose  of  this  chapter  is  to  derive  game  optimal  strat- 
egies so  as  to  allow  the  determination  of  the  appropriate  controls  at 
each  stage.  It  is  assumed  that  pure  strategies  exist;  necessary  and 
sufficient  conditions  are  derived  for  this  to  be  true. 

2.  NOTATION 

An  N stage  game  is  defined  as  one  requiring  that  each  player 
choose  a value  for  his  control  at  N instants  of  time.  Time-to-go, 
rather  than  the  usual  forward  flowing  time,  is  treated  as  the  inde- 
pendent variable.  A subscript  i indicates  that  the  subscripted  var- 
iable is  at  the  i^1  stage.  Thus  z^  represents  the  state  at  the  N*^ 
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(initial  stage  — N stages  to  go)  stage,  while  represents  the  state 
of  the  system  at  termination. 

A superscript  is  used  to  indicate  the  set  of  present  and  all 
past  values  of  the  variable  under  discussion.  Thus  z1  is  the  set  of 
the  present  state,  Zj  (the  state  when  there  is  one  stage-to-go),  as 
well  as  all  past  values  z ^ z y • • • , zN>  i.  e.  , z 1 = (z  { , z r • • • , z ). 
Naturally,  z^  = z^. 

A similar  convention  is  used  to  indicate  integration  over  a set 
of  variables.  Thus  d d z ^ • • • d z N is  w ritten  d z 1 . Further,  when 
several  different  variables  of  integration  are  used,  only  a single 
integral  sign  is  used;  the  number  of  integrations  is  indicated  by  the 
differential.  Thus 

/pV!,  ■ “ • XN  lyr  • • • . yN)dzj  dxt  • • • dxN 

is  written  as 

P(Zj. x1  |y 1 ) d(Zj , x1) 

When  no  limits  of  integration  are  specified,  the  integration  is  con- 
sidered to  be  from  minus  infinity  to  plus  infinity. 

Quadratic  forms  are  given  by 
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so  that  a Gaussian  probability  density  is  given  by 


P(x) 


1 

75 exp 

(ZnrMol 


where  x is  a vector  with  n components,  x is  the  mean  of  x,  and  a 
is  the  n Xn  covariance  vector  of  x.  For  simplicity,  only  twice  the 
negative  of  the  exponent  is  used  when  computations  are  required.  Thus 
the  probability  density  given  above  is  denoted 


p(x)  = 


I2 

o 


-2 


The  Gaussian  (normal)  probability  density  is  also  denoted  as 


x:  N(x  . ct2) 

where  x and  02  are,  respectively,  the  mean  and  covariance  of  the 
random  variable  x,  i.e.. 


x = E (x} 


o2  = E j (x  - x) (x  - x,T  J 

where  E is  the  expected  value  operator. 

Various  other  subscripts  are  used  to  identify  variables  as 
required.  Thus  0 2 indicates  the  covariance  matrix  of  the  random 
variable  n at  stage  2, when  two  stages-to-go  remain. 


j 
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3. 


DERIVATION  OF  PURE  CONTROL  STRATEGIES 


The  evolution  of  the  state  is  described  by  a linear  vector  dif- 
ference equation 


= ki+lzi+l 


+ ai  + lUi+l 


bi+lvi+l 


+ X 


i + 1 ’ 


0,  1, 


N - 1 


(1) 


where 


z . 

l 

u 

l 


V. 

1 


X. 

i 


k ,a  , b- 
ill 


n component  vector  representing  the  state  at  stage  i 
m component  vector  representing  the  minimizing 
player's  control  at  stage  i 

m'  component  vector  representing  the  maximizing 
player's  control  at  stage  i 

n component  vector  representing  the  realization  of  an 
independent  noise  sequence  at  stage  i 
deterministic  matrices  of  the  appropriate  dimensions 


Both  players  make  observations  of  linear  combinations  of  the 
elements  of  the  state  vector,  but  each  observation  is  corrupted  by  an 
additive,  independent  noise  sequence  so  that 


X 

II 

N 

+ ni 

(2) 

y = Hz. 

+ s. 

(3) 

whe  re 

x = q component  vector  representing  the  minimizing  player's 


observation  at  stage  i 


y.  = q'  component  vector  representing  the  maximizing 
player's  observation  at  stage  i 
= q component  vector  representing  the  realization  of  an 
independent  noise  sequence 

§.  = q'  component  vector  representing  the  realization  of  an 
independent  noise  sequence 

hT,G.  = deterministic  matrices  of  the  appropriate  dimensions 
The  Value  of  the  multistage  game,  Jj^,  is  given  by 


("Min  max"  could  be  replaced  by  "max  min"  since  only  pure  strat- 
egies are  considered.  ) u.  and  v.  can  be  any  vectors  of  real  numbers; 
however, 

cpl.e.  >0,  i = 1,  2.  •••.  N fs) 

where  the  inequality  in  (5)  means  that  each  criterion  parameter  is  a 
real  positive  definite  matrix.  Inequality  signs  are  used  as  needed 
and  should  be  read  to  mean  positive  definite  (instead  of  greater  than 
zero),  positive  semidefinite  (instead  of  greater  than  or  equal  to 
zero),  etc. 
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Only  cj  need  be  positive;  c^(i^  1)  need  only  be  nonnegative.  If 
c.  does  equal  zero,  then  the  result  is  a terminal  control  problem.  The 
remaining  inequalities  of  (5)  are  required  to  make  the  game  meaning- 
ful. If  any  parameter  were  to  be  zero,  then  either  there  would  be  no 
effect  due  to  a control  at  that  stage  (a^  or  b^  equal  to  zero)  or  there 
would  be  no  cost  for  using  control  at  that  stage  (d^  or  e.  equal  to 
zero) . 


For  convenience,  it  is  assumed  that  all  the  criteria  parameters 
in  (5)  are  symmetric. 


Each  of  the  various  noises  is  assumed  to  have  a Gaussian 


(normal)  probability  density  as  follows: 


VN(°’a0 


(6) 

(7) 

(8) 


The  initial  state,  z^ , also  will  be  a Gaussian  random  variable: 


(9) 


It  is  assumed  that  both  players  know  the  information  contained 
in  (1  ) through  (9).  This  does  not  mean  that  player  I knows  the  actual 
value  for  x.  - the  observation  of  the  state  at  the  ith  stage  made  by 
player  II  — but  that  he  does  know  the  structure  of  the  observation,  as 
given  by  (2),  and  the  probability  density  of  the  additive  noise,  as 
given  by  (7). 
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To  actually  solve  the  multistage  game, the  principle  of  opti- 
mality of  the  theory  of  dynamic  programming  will  be  used.  To  do  this, 
game  optimization  is  first  carried  out  for  the  controls  chosen  with  one 
stage-to-go.  Uj  and  Vj  . The  technique  used  is  similar,  in  terms  of 
the  basic  structure,  to  that  used  to  solve  multistage  stochastic  optimal 
control  problems.36  The  Value  for  this  one-stage  game  is 


min  max 

T = u V E ! Z 

1 1 {' 1 o' 'Cj 


+ U, 


1 * ' e i ) 


min  max 

U1  V1 


/ jii^iic,*  i|ui||vllvi11',! 


X p(z^,  u j , Vj  ) d(z^,  Uj  - vj  ) (10^ 

where  p(z  , Uj  , Vj)  is  the  joint  (Gaussian)  probability  density  function 

3 "7 

of  z , u.  , and  v . Since' 
o I 1 


V V V = f 


p(z  , Z , Uj  , Vj)  dZj 


= /p^vvV*vvV-n 


Equation  (10)  can  be  rewritten  as 
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J4  = E 


? ) min  max  / ( ~ 

|,,Xl,lc1j+  U1  V1  j|Mk1z1+alVb1vlllc1+lluilld1-||v1llei 


X p(Zj  , Uj  , Vj ) d(z j , Uj  , Vj  ) 


Noting  that 


• U1  . v,  ) - , x1  , y1  , u1  , v1  ) d(x*  , y1  , u2  , v2) 

j P(U1'VJ  I ZJ  -x1  .y1  ,U2,  V2)  p(Zj  , X1  , y1  , U2, 


v , , 1 1 2 2, 

X d(x  , y , u , v ) 


where  the  entire  past  history  of  each  player's  observations,  x and 
y , and  controls,  u2  and  v , has  been  introduced,  (12)  can  be 
rewritten  as 


Ji=F! 


7 ) min  max  / ( \ 

l|Xl  11  cj+  ul  V1  J|llklzl+alu1+blv1llc1+llu1lld  -l|Vllle  J 


, I 1 1 2 Z, 

x P(u!  « vj  I ’ x *Y  ,u  ,v  ) d(uj  , vj 


vi  1 1 2 2,  1122, 

X p(z.  , x , y ,u  ,v  )d(z,x  , y ,u  ,v  ) 


It  is  at  this  point  that  admissible  strategies  are  introduced.  Equation 
(14)  indicates  that  the  probability  density  for  Uj  and  Vj  can  be,  if 
desired,  conditioned  on  the  actual  value  of  the  state,  Zy  all  or  part  of 


player  l’s  past  observations,  y*  , and  controls,  v^;  and  all  or  part  of 
player  Li's  past  history  of  observations  and  controls,  x*  and  u2  , 
respectively.  Each  choice  of  the  characterization  of  information 
available  leads  to  a different  problem  with  different  results.  Further, 
if  desired,  the  allowable  structure  of  the  controls  may  be  specified. 
(This  will  be  delineated  in  Chapter  5.  ) 

A very  reasonable  set  of  controls  may  be  found  which  uses 
only  information  reasonably  available  to  each  player.  That  is,  playe 
1 may  choose  a control  strategy  with  one  stage-to-go  using  only  his 
own  history  of  observations  and  controls  (and,  of  course,  his  knowl- 
edge of  the  dynamics  and  payoff  of  the  game  as  given  by  (1  ) through 
(9)),  while  player  II  chooses  his  control  strategy  based  on  his  own 
history  of  observations  and  controls. 

Assuming  that  nonrandomized  (pure)  control  strategies  exist 
for  both  players,  and  denoting  them  by  an  overbar,  the  game  optimal 
control  strategies  are  given  by 


— 

— 

, l 

2 

U1 

= U1 

(x  , 

, u 

V1 

= V1 

, 1 2 
(y  . v 

Note  that  nothing  has  been  said  at  this  point  about  the  strategies  used 
to  determine  u2  and  v2 . In  particular,  note  that  no  assumptions  con- 
cerning their  optimality  have  been  made. 

In  view  of  (15)  and  (16),  let 
\ \ 2 1 

p(Uj  , vj  | Zj  , x , y , u , v ) = 6(Uj -Uj ) 6(v  -Vj  ) (17) 

where  5 represents  the  Dirac  delta  (impulse)  function.  Substituting 
(17)  into  (14)  and  integrating  over  u j and  Vj  yield 
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1 


= E 


Zj  +aj  Uj  +bj  Vj 


X 


I 1 1 

p(z1  .x  , y 


1 1 2 2\ 
x , y , u , v j 


(18) 


where  the  min  max  operation  no  longer  need  be  performed  since  Uj 
and  Vj  are  precisely  those  control  strategies  which  satisfy  the 
min  max  = max  min  requirement. 

Variational  arguments  can  be  used  to  find  the  actual  form  of 
the  game  optimal  controls  based  on  the  strategies  allowed  by  (IS)  and 
(16).  Define 


X1  ’^1  * = E 


klVa!Vbl*l 


II 


2 


V / 1 1 2 2\  , / 1 1 2 2 \ 

X p(zrx  , y ,u  ,v  ) d(zrx  , y ,u  ,v  ) 

where  u^  and  are  any  admissible  strategies.  It  immediately 
follows  that 

min  max 

^1  ^1  Il(^l  ,'?1)  = Il(^1  *^i)  = J! 


(If) 


(20) 


Game  optimal  controls  must  satisfy  the  saddle  point  conditions  which 
are: 

^(Uj.Vj)  i Jj  * IjlUj.Sj)  (21) 

where  Jj  is  given  by  (18). 
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Let 

= Uj  + e 6 (x* , u2)  (22) 

where  e is  a small  number  and  6 (x  ,u  ) is  any  real  vector  function, 
of  the  appropriate  dimension,  of  x and  u . Using  (22),  the  left-hand 
inequality  of  (21  ) can  be  written  as 

I (uj  + e6)  - Jj  * 0 (23) 

where  the  arguments  of  5 have  been  omitted  for  brevity.  Substituting 
(18)  and  (19)  into  (23)  yields 
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Using  the  chain  rule  of  conditional  probability  densities  and 
applying  it  to  the  probability  density  of  (24)  yields 


The  mean  associated  with  the  first  three  conditional  probability'  densi- 
ties of  (25)  have  the  following  meanings: 

/ I 1 1 

P[zjlx  .y  .u  ,v  J=  minimum  mean  square  estimate  (MMSE) 
of  the  state,  Zj  , given  all  past  and 
present  observations  and  past  controls 
for  both  players 

P(y  I x , u , v ) = MMSE  of  all  observations  of  player  1,  y*  , 
given  player  II' s past  and  present 
observations  and  all  of  both  players  past 
controls 

p(v  |x  ,u2)  = MMSE  of  all  past  controls  of  player  I,  v*- , 
given  only  player  I's  pa3t  and  present 
observations  and  past  controls 

It  is  important  to  note  again  that  (25)  involves  estimation  only.  No 
assumptions  concerning  the  game  optimality  of  any  of  the  past  con- 
trols of  either  player  have  been  made. 


47 


Substituting  (£5)  into  (2.4)  yields,  after  some  slight 


manipulation , 


lj  (Uj+efi  , V|  ) - J j 


klZl 


al+dl 


)» 


i4aTcibivi! 


X 


1 2 Z\ 

y . u .v  J 


dz 


1 


2 


u 


x p(x  1 . u1")  d(x  ' , uZ) 


+ j6T(x',u1)(a1Tc1a1+d1)  6 (x1  ,u*  ) e“  | 

X p(zj  , x 1 , y1  , u2  , v2)  d(zj  , x1  , y 1 , u2  , v2  ) i 0 (261 

In  view  of  (6)  and,  because  e and  6(x1,u  ) are  real,  it  is 
clear  that  the  second  integral  of  (26)  is  positive  semidefinite.  Invoking 
the  standard  variational  arguments  of  the  calculus  of  variations,  it 
follows  that  the  coefficient  of  e6(x1,u")  in  (26)  must  be  equal  to  zero. 

If  it  were  not  zero,  then  6(x*,u^)  could  be  chosen  to  have  the  opposite 
sign  as  its  coefficient.  For  e small  enough,  the  first  integral  would 
be  larger  in  magnitude  than  the  second  and  the  inequality  would  no! 
hold.  Thus  a necessary  condition  that  a pure  strategy  exist  for 
player  II  is  that 


I I \ 'j1  i 1 1 2 2 \ 

/ p cikizi  + (ai  ciai+di)  "i+ai  cibivi Mzilx  >y  -u  *v  )dzi 


* F^y1  I x1,  u2  . V2  ) dy1  p|v2  I X1  , U2  ) dv2  = 
1 2, 


0 


(27) 


(It  is  assumed  that  p(x  ,u  ) is  nonzero  for  all  values  of  its  arguments.) 

Looking  now  at  the  right-hand  inequality  of  (21),  it  is  easy  to 
see  that 


I,  (Uj  , Vj  +eA)-Jj  = /2 
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,tTclklzl+bllclalul-(el-blAclbl)  VI 


p(zj  I x1  , y1  , u2  , v2  ) dz 


p(x*  | yl  ,u2,  v2)  dx*  p(u2  | y*  , v2)  du2  cAjyVv2) 


v / 1 2.  1 2. 

X p(y  ,v  ) d(y  ,v  ) 


erbTcib 


l)  A(> 


e 


2 1 
\ 


x 


1 2 

y .u 


(28) 


where  A(y  , v ) is  any  real  vector  function  of  the  appropriate  dimen- 
1 2 

sion  of  y ana  v . It  immediately  follows  that  the  necessary  con- 
dition for  a pure  strategy  to  exist  for  player  I is  that 
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/I  TT  TT  'p  j \ \ "2*  “2* 

tbi  ci  kizi  +bi  ciaiui  _(ei  'bi  °ibi)  ^1 1 p(*jI*  >y  -u  -v  )dz( 


X p(x'  I yl  , u“,  V2)  dx1  p(u2  I y*  , v2)  du2  = 0 (29) 


- b ^ c j b j 2 0 


(30) 


Equations  (22)  and  (24)  can  be  solved  simultaneously  for  Uj  and  . 

To  do  this  it  is  useful  to  introduce  a set  of  linear  transformations 

3 8 

defined  on  suitable  Hilbert  spaces.  In  each  of  the  following  trans- 
formations, a(  • ) is  an  element  in  the  domain  of  the  transformation 
and  8(")  is  an  element  in  its  range. 


1 1 2 2.  _ 

P(x  , y ,u  , v ) = T j a(Zj 

whe  re 


’.v2) 


-*  L, 


1 2 
x ,u  , 


(32) 


Both  the  domain  and  the  range  of  the  transformation  are  thus  defined 
to  be  Hilbert  spaces,  with  the  appropriate  conditional  probability 
density  taken  as  a measure  on  the  space.  By  introducing  such  a 
measure,  a number  of  functions,  which  would  not  ordinarily  be 
when  the  limits  of  integration  are  plus  and  minus  infinity,  can  be 
considered  elements  of  a Hilbert  space.  In  particular,  the  element 
Zj  is  now  . 

It  should  also  be  noticed  that  the  range  of  ^ is  multi- 
dimensional, since  y*  represents  the  N values  of  y..  This  does  not 
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add  any  conceptual  difficulties  although,  as  is  seen  later  in  an 
pie,  the  practical  problems  of  evaluation  are  increased. 

The  remaining  transformations  are 


P(x\u2,  v2)  = T 


12 


a(y1)=  f MySply1  fx1 ,u2, v2)  dy1 


where 


T12:  L2  [-00’00;  p(y 1 lxl  ,u2,  v2j-*L2 [-»,»;  p(v 


2.1  2, 

x , u ) 


/• 


p(x  ,u  ) = Tj  ^ a(v  ) = J a(v  ) p(v2|x'  ,u2)  dv2 


where 


T13:  L2  P<v2|xl  -u2)]  -*  L2  P(xI  ly1  .u2,  V2)] 


/C 


P(y1,u2,v2)  = T14  a(x')  = / a(xS  p(x'  ly1^2^2)  dx1 


whe  re 


T14:  L2  [•“-*>  p(x*  ly1  .u2,v2)j  -*  L2  p(u2|y'  , V2)j 


/• 


^(y1^2)  = T15  a(u2)  = / a(u2)  pfu^y^v2)  du2 


whe  re 


T 1 5*  L2  p(^»2  I y 1 . v2)j  -*  L2  £-® ,®;  p(y 1 


1 2 2 
x , u , v ) 


exam- 


(33) 


(34) 


(35) 


(36) 


(37) 


(38) 


(39) 


(40) 
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and 


1 2 2 . 

P (x  . y . U , v ) = T 


1 6 


a(Zj)  = Jd{z1)  p |z j lx1  .y1  ,u2,v2)  dZj  (41) 


whe  re 


Tl6:  L2  p(zl  I x ,y  ,u  ,v  )]■*  L2  p(xl  I yl  ,u2,  y2)j 

(42) 

T16  and  Tll  differ  onlV  in  the  measure  defined  on  the  range  space. 

For  all  practical  purposes,  they  are  identical  since,  for  the  elements 
of  the  domain  of  T j j and  Tj  ^ encountered  in  this  problem,  an  element 
in  the  range  of  one  is  also  an  element  in  the  range  of  the  other.  This 
is,  of  course,  not  an  intrinsic  property  of  linear  transformations  but 
is  a direct  consequence  of  the  simple  structure  of  the  problem  under 
consideration . 

Using  (31)  through  (42),  it  is  possible  to  write  (27)  and  (29)  as 
linear  operator  equations  as  follows: 


ll  ClklT13T12TllZl  + ( a??cl  al  + di ) ui  +alTclblTi3Ti2vl  =0 

(43) 

*1  ClklT15T14T16Zl  +blTclalT15T14Ul  ' (el'blTclbl  ) Vj  = 0 


(44) 


Solving  (43)  and  (44)  simultaneously  for  Uj  and  v(  yields 


ui=" r’(ai  ciai+dJ  ai  cibi(bi  cibrei)  biTciaiTi3Ti2TI5Ti4 ! 

X(alTcl*l+dl  ) a|T'l  T,  3T!  z!T1  ltbl  (el-blTclbl  fbfc,*,  5T,4T,  6|k,., 


(45) 
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-1 


vl  = |1-(blTclbrel)  bl  clal(al  clal+dl)  ai  ClblTl  5T1  4T1  3T1  2 I 


T 

A T T*  rT' 

alClTI  3T12T1  1 


Vl 


(46) 


where  I is  an  appropriately  dimensioned  unit  matrix. 

Since  is  an  element  of  the  range  space  containing  TjjZj 

(45)  and  (46)  can  be  (slightly)  rewritten  as 


T 

c 


b,(b 


T K 

1 Clb 


-1 


bl  Ciai  T1  3T1  2T1  5T 1 4 


* (a jTc  1 al+d l ^ 


-1 


i ciai  3ai 2 r 


a.c.T.  ,T.,  I + bt( 


. T 

Vbi  c 


lbJ 


-1 


blT'lT15Tl4!TllSI. 


v!I-(blTclbrel)  »1Tc1“l  (alTcl»l+dl)  alTclblT15TUTl  1TI2 


(47) 


(e\~h\ 


clbl 


)‘  bTc.T 


„ T 
1 1 4 


I_al  (al  clal  + d J al  C1T1  31!  2 l1!  1 kl  7‘\ 


(4  8) 


Equations  (47)  and  (48)  represent,  in  functional  form,  the 
game  optimal  control  strategies  to  be  used  by  each  player.  But, 
unless  a way  to  evaluate  the  inverses  is  found,  the  solutions  are 
formal  and  essentially  meaningless.  Happily,  they  can  be  evaluated, 
under  certain  circumstances,  in  an  infinite  series.  This  series,  a 


I 

j 
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converges  (the  inverse  exists  and  has  meaning) 


Neumann  expansion, 
whenever  the  norm  of  the  second  term  of  the  inverse  is  less  than  one. 
In  other  words , 


(I-RT)'1  = I + RT  + (RT)2  + (RT)3  + 


(49) 


whenever  the  norm  of  RT,  denoted  1 1 RT  1 1 , is  less  than  one.  For  the 


present  case,  the  inverses  exist  whenever 

+dl  ) ’ *,V,(b,Vl -*1  f blV,T13T>2T15Twll  * 1 


II  (a,Tc,a, 


“1  M“1 


(50) 


(blT°l  bl  "el ) 'VnMV'lV^  air°l  blTl  5T14T1  3T1Z  11  < 1 


where  the  norm  of  a transformation  RT  is  given  by 
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1 1 RT  || 


sup  1 1 R Ta  1 1 

a ~TRT 


(51) 


(52) 


and  a is  any  nonzero  element  of  the  domain  of  the  transformation  RT. 
Naturally,  the  norms  of  a and  RTa  are  computed  according  to  the 
weighting  function  defined  on  the  appropriate  Hilbert  space. 

Thus  sufficient  conditions  for  the  existence  of  game  optimal 
controls  are  (43),  (44),  (50),  and  (51),  and 
T 

al  Clal  + dl  > 0 (53) 

T 

C1  " bl  Clbl  > ° (54) 

where  the  strict  inequality  has  replaced  the  positive  semidefiniteness 
of  (30). 
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When  Uj  and  exist,  it  is  a straightforward  (although 
extremely  tedious)  task  to  verify  that  the  following  assumption  is 
valid:  each  involves  only  the  past  and  present  observations  and  past 
controls  available  to  the  appropriate  player.  The  required  admissible 
game  optimal  control  strategies  thus  have  been  found  for  the  last  stage 
of  an  N stage  game.  There  is  still  the  matter  of  actually  evaluating 
the  various  conditional  probability  densities;  this  is  discussed  later. 

Having  found  the  game  optimal  controls  for  the  last  stage,  it  is 
now  possible  to  use  the  principle  of  optimality  of  the  theory  of  dynamic 
programming  to  find  the  game  optimal  control  strategies  at  stage  2, 

3,  • • • , N. 

In  other  words,  the  game  optimal  control  strategies  are  chosen 
to  optimize  the  payoff  resulting  from  the  application  of  controls  at 
stages  1 and  2.  Thus  controls  are  applied  at  stage  2 which  have  an 
effect  on  the  payoff  and  serve  to  change  the  state.  Whatever  the  state 
resulting  (and  whatever  the  actual  observations  occurring  at  stage  1), 
game  optimal  control  strategies  u^  and  v^  will  be  used.  Symbolically, 
this  is  written 


Analogous  to  (11) 


Substituting  (56)  into  (55)  and  integrating  over  yield 


min  max 


j2=eJ||x2||‘_|+u2  v2 


2 I 


z2+a2u2+b2v2Hc_+lKHd  - l|v2lle 

Cm  b ' 


X p(z2>u2,v2)  d(z2>  u2,  v2)  + Jj 


(57) 


Consider  Jj  , as  given  by  (18),  where  Uj  and  Vj  are  given  by  (47)  and 
(48).  Since 


/ 1 1 2 2 \ f I 1 1 2 2 \ 

>(VX  *V  >u  ’v  ) = J P(Z1’Z2*X  *u  'v  ) 


dz_ 


(58) 


J 


Substituting  (58)  through  (62)  back  into  (18)  yields 


where 


Ji  = E{^xi  llct  j+/Yi  (*2.x2.y2.u2,v2) 
Xp(z2,x2,y2lU2,v2)d(z2,x2.y2,u2,v2) 

Y,  (z2  .*2  ,y2  ,u2,v2  ) =J J ||kj  Zj+ajUj  +b,  vj  |2^+  ||Ul  ll^-Hvj  1 12 

x P(Xj  I Zj  ) dx j p(yj  | Zj  ) dYl 


(63) 


X p(zl  I z2’u2'v2  )dzl 


(64) 


Noting  that 


>(z2  > u2  , v2  ) = /"p(z2  , x2  , y2,  u2,  v2)  d(x2,  y2,  u3, 


V3) 


(65) 


Equation  (55)  can  be  rewritten 


as 


J2  = E { IU,  M^  + I|x2  11^  j + u2  v2  /||,k 


2*2+.2u2+b2v2l|^  t||u2l|2 

2 


^V2  l*e2  + Yi(z2’x2’y2>u2-v2)jp(*2'x2>y2-u2.v2) 


v ./  2 2 2 2\ 
X d(z2,x  ,y  ,u  , v ) 


(66) 
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1 


To  find  the  game  optimal  control  strategies  at  stage  2,  the 
same  procedure  used  to  find  the  game  optimal  controls  at  stage  1 is 
applied.  That  is,  assume  pure  control  strategies  exist,  as  in  (12). 
Defining 


i2(a2,v2)=Ej|K1||^t|k2||2jtj'j||k 


2Z2+a2ff2+b2  V2^c2  + ^2^d2 


+ Y1(z2^2-y2.u2,u3,v2.v3)jp(z2,x2,y2,u3,v3) 


„ ,1  2 2 3 3\ 

Xd(Vx  ,y  ,»  ,v  ) (67, 

analogous  to  (19),  use  the  saddle  point  condition  in  both  directions 


l2\uz  + g6(x2>u3)  v2  ) - J2  a 0 


1z\u2,vz + e^y2- v3))  ‘ 


J2  5 0 


(68) 

(69) 


Linear  transformations  similar  to  those  given  in  (31)  through  (41)  are 
used  to  generate  a pair  of  simultaneous  linear  operator  equations 
which  are  then  solved  simultaneously  to  find  u and  v . 

At  this  point,  the  procedure  has  become  perfectly  general. 

The  procedure  at  any  stage  is  then  given  by 
i 


i , ) min  max  f ( , , 

Ji=E|Z  l,xjllc.j+ui  vi  Ji l^iVWVillc.  + IKHd. 


Ivil'e  + y3 
i 


I i i i i \ I / iiiii 
'i_i(zi*x  -y  >u  *v  )jp(zi-x  .y  .u  ,V  ) 

X d x1 , y1,  u1,  V1) 


(70) 


58 


where 


,i.1(»1,x‘,y,.„‘.v‘)  = /{||k._l.i_1+.i_1u._1+bi_Ivi_1||J  «||„  \f 

J ' 1-1  1-1 


l,Vl,,ei_1+V2K-l'xl‘1*yX'l*ui-l*ul'vi-l'vl)j 


p(zi-l 


i-1  i-1  i i 


„ - 1 MjI  1-1  1-1  1 l \ 

,x  > y .U  ,V  Jd^^.x  , y ,u  ,v  ) 

(71) 


At  each  point  the  saddle  point  conditions 

1i(ui  + €6(x1*u1+1  ).  v.  J - J.  2 0 

Ii(^i»  vt  + eA(y\  v1'1 ) ) - J.  s 0 


(72) 

(73) 


are  employed  to  generate  the  necessary  conditions  for  game  optimal 
strategies . 

While  the  outline  of  the  optimization  problem  is  straight- 
forward, the  actual  evaluation  of  the  strategies,  even  for  the  easiest 
case  of  Gaussian  random  variables,  is  extremely  tedious. 

IV.  GENERATION  OF  THE  REQUIRED  PROBABILITY  DENSITY 
FUNCTIONS 

Some  of  the  required  probability  density  functions  are  quite 
easy  to  express.3'  For  example,  from  (2)  and  (3),  using  the  notation 
discussed  in  Section  II, 


p(x- 1 z.)  = IIx.-g.z.  |(!: . 


i l 


i i i"a 


T); 


(74) 


p(yi|z.)  = |ly.  - H.z.II^.2 


Also,  from  (1 ), 


p(zi'zi+rui+r Vi+i)  = 1|zi-ki+izi+rai+iui+rbi+ivi+illa -2 

1+1 

The  remaining  conditional  probabilities  can  be  found  recursively  by 

using  the  chr.in  rule  for  conditional  densities  in  combination  with 

„ , , 37 

Bayes  s rule. 

An  auxiliary  conditional  density  function  is  first  found 

/ i ii  i+1  i+l\  1/  i i|  i+1  i+ 1 \ , 

p(z.,x,y|u  ,v  ) = Jp(z..zi+1,x  ,y  |u  ,v  ) dz.  + 1 

//  I i+1  i+1  i+1  i+l\ 


/ i+1  i+li  i+1  i+l  \ , 

xp(zi+l,x  *y  ,u  ,y  )dzi+l  C77) 

But 

/ l i+1  i+1  i+1  i+1  \ / | i+1  i+1  i+1  i+l\ 

p(zi'xi'yi'zi+rx  ,y  ,u  ,v  ) = p(xi>yi'zi'2i+i>x  -y  -u  -y  ' 

ii  i+1  i+1  i+1  i+l\ 

xp(zi,zi+rx  *y  >u  ,v  ) 

= p(*J  zi>  pfyj  zt) 

x P(ziUi+1  • ui+1.  v.  + 1)  (78] 

where  (74)  through  (76)  are  the  justification  for  saying  that  the  con- 
ditional densities  of  x^  and  y^,  given  z^,  depend  on  nothing  else  and 
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— 


that  the  conditional  density  for  z^,  given  z.  + ^ , u-  + j*  and  v-+j  is  not 
changed  if  more  information  is  available. 

Also,  it  is  easy  to  see  that 


i+1  i+1 | i+1 

x , y |u 


i+1 

v 


i+1  i+1 

x » y 


i+2 


u 


(79) 


s t 

since  the  value  of  a control  chosen  at  the  i+1  stage  can  yield  no 
information  concerning  either  the  state  or  an  observation  of  the  state. 
It  must  be  reiterated  that  these  conditional  densities  do  not  involve  any 
assumptions  concerning  the  optimality  of  controls  chosen  in  the  past. 

In  particular,  for  purposes  of  estimation  — which  is  what  the  con- 
ditional densities  actually  represent  — no  assumptions  concerning 
strategies  are  required.  When  the  probability  density  of  one  or  more 
random  variables  is  conditioned  on  one  or  more  values  of  another 
variable,  these  conditioning  variables  enter  the  density  function  only 
as  specific  values.  In  this  case,  it  means  that  one  need  only  specify 

a set,  any  set,  of  values  for  u*+^  and  v*+^  and  then  evaluate 

I i+1  i+1  | i+1  i+l\  ,lr,  . 

MZi+l’X  ’ y lu  ,v  ]•  When  this  is  done,  it  can  be  seen  that 

the  values  for  u^j  and  w + ^ do  not  appear  in  the  conditional  density 

function. 

Making  use  of  (77)  and  (78)  allows  the  rewriting  of  (77)  as 


p(z 


y |u 


i+1  i+ 


p(x.  | z.)  p(yj  z.)  p(z.| 


"i+1 


’ Ui+1 


v. 


i+1 1 


x 


i+1  i+1 

x .y 


i+2  i+2 


u 


) dz. 

' l 


+ 1 


(80) 


which  is  the  desired  recursion  relationship  for  p(z.  .x^y1  |u1+1  ,vi+ 1 ) . 
To  start  (80)  off,  note  that 
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I N N | N+l  N+l  \ . 

P\*N,X  >y  >u  ’v  ) p 7-N  ’ XN  ’ yN 

P(XN  • Y]\j  I ^ r\f ) 

p(XN  ^ *N)  p(7N) 


where  the  first  two  conditional  densities  are  given  by  (74)  and  (7S)  and 

the  last  one  is  given  by  (9).  Thus  it  is  possible  to  compute 

/ i i i if  l i+ 1 \ , 
pl^.  x , y | u , v ) for  l t 1,  2 , ' * • , N. 

The  required  conditional  probability  densities  are  then: 


/ | i i it  1 i + 1 \ 

p(/..|x  ,y  ,u  , v ) 


I l lilt)  ill) 

p|/..,x  , y | u ,v  ) 


I I i i | it  1 i+1 \ , 

/ P^.  x .y  In  .v  ) dz. 


/ p(zi,x1I  y‘|ul  + 1 , vH  1 ) dz 
/ i I i i*f  1 1 

p(y  I x * u ,v  ) isn 

/ p(zj»  X1,  y‘|  UH  1 , VU  1 ) d^.y1) 


I I 1 i | it  1 it  1 \ , 

/ P k.x  . y I VI  .v  ) dz 

l i | i i + 1 i + 1 \ 1 

px|y,u  ,v  ) : (841 

I , i ii  i+1  it  1 \ ,/  ii  1 1 

p *i’X  -v  |u  .v  d z .x 


The  last  two  sets  of  conditional  densities  — p(v'|x’  * , u* ) and 
p(u  |y  , v‘)  — must  be  handled  somewhat  differently. 


where 


/ i+1 | i i 

, i.  i l x,  Pxi-l*vi*v  <x  'u 
p(v  | X ,u  ) = 


/ P(X1-1  ’ vi’ v*+1  lxl-ul)dvl 


(85) 


P(Xi-l  * Vi’  yl+1  lxl,ul  )=  p(xi-l*vi|x1.u1.v1+1)  p(v1+1  |x\  u1)  (86) 


Looking  first  at  the  second  conditional  probability  density  of  the 
right  side  of  (t>6),  it  is  clear  that 


I x+1 | i i\  /i+lii  i+l\ 
P (v  |x  , u ) = p (v  |x  , u f 


(87) 


since  knowledge  of  the  value  of  a later  control,  for  either  player,  can 
have  no  effect,  can  yield  no  information,  on  the  estimate  of  the  value 
for  an  earlier  control  unless  there  is  some  a priori  known  functional  or 
statistical  relationship. 

The  first  conditional  probability  density  on  the  right  side  of  (66) 
can  be  written  as 


p(xi- r vJxl>  u*’ v*+1)  ‘ p(xi- 1 yl)  p(vJx1' ul» yl+1) 


(88) 


The  first  conditional  density  on  the  right  side  of  (88)  presents  no  real 
conceptual  difficulties.  It  is  generated  by 


p(xi_  1 lx'-  u'*  v'  ^ = 


/ pIvi-*1'1.  y1' 1 1 vi1.  v1 ) d(*ul.  y1'1) 

/ p(‘i-i*xl'1,  y1" 1 l“l. v1 ) dh.i*xi-i'y1"1) 


(89) 

The  second  conditional  density  on  the  right  side  of  (88)  is  not  as 
straightforward.  In  fact, 


to 


p ( V.  I X1,  u\  v1+1  ) p(v.) 


(90) 


To  understand  what  is  meant  by  (90),  a clear  understanding  of  the 
principle  of  optimality  is  required.  Roughly,  it  states  that,  no  matter 
what  has  occurred  in  the  past,  the  best  that  can  be  done  is  to  choose 
the  controls  in  an  optimal  fashion  in  the  future.  V\  ith  regard  to  the 
present  problem,  it  means  that  a player  need  not  (actually  should  not) 
assume  that  rus  opponent  has  used  an  optimal  strategy  or  even  that  his 
opponent  has  used  any  nonrandom  strategy  whatsoever.  Naturally,  each 
player  has  a complete  record  of  his  own  observations  and  past  controls, 
but,  even  were  he  to  be  given  a complete  list  of  his  opponent's  control 
values,  vl  , in  the  absence  of  any  strategy  which  relates  the  opponent's 
control  strategy  to  his  (the  opponent's)  observations  or  to  any  other  set 
of  data,  they  can  provide  no  hint  as  to  what  vv  will  be.  The  only 
information  that  the  opposing  player  can  count  on  is  data  concerning 
physical  bounds  on  the  magnitude  of  the  control  available  at  the  i 
stage.  Accordingly,  the  a prion  probability  density  for  v.  is  actually 
a uniform  distribution  over  the  physical  limits  known  to  exist.  It 
should  be  stressed  that  this  does  not  meaji  that  one  player  believes  that 
his  opponent  should  have  or  would  have  chosen  his  1 control  from  a 
uniform  distribution;  rather,  it  reflects  the  very  limited  knowledge 
available  to  a player  about  his  opponent's  real  choice.  It  is  merely  the 
best,  reliable  information  present. 

Because  this  problem  is  addressed  to  C»aussian  random  vari- 
ables, it  makes  sense  to  approximate  the  uniform  distribution  over  a 
bounded  set  of  values  by  a Ciaussian  distribution  over  an  infinite  set  of 


6 4 


r I 

values,  A reasonable  choice  would  be  one  with  the  same  mean  as  the 
uniform  distribution  fund  with  a variance  such  that  the  bounds  of  the 
uniform  distribution  are  equal  to  plus  and  minus  one  standard  deviation 
of  the  Gaussian  distribution.  Such  a choice  yields  a relatively  constant 
probability  density  function  over  the  bounds  of  the  uniform  distribution. 

(The  choice  of  a Gaussian  random  variable,  instead  of  the  uniform,  is 
done  only  for  the  convenience  associated  with  them.  Theoretically, 
there  is  no  reason  why  the  uniform  distribution  should  not  be  used.  ) 

If  no  information  as  to  capability  is  available,  then  the  obvious 
choice  for  (90)  is  a density  whose  variance  is,  in  the  limit,  infinite. 

Working  with  such  variances  leads  to  no  difficulties. 

Substituting  (86),  (87),  (88)  and  (90)  b^ck  into  (85)  leads  to  the 
required  recursive  formulation: 


1 ' 1 

i-  1 ' \ 

p(v.)  p(x. 

1 > i 
_ ! | x , u , v 

')  p(vUI|x' 

it  1 
. u 

) 

(91) 

p(v  1 

x , u ) 

/ P<v,>  P(xi_ 

1 i i i 
j 1 x , u , v 

I p(v'  + l|x'. 

u,+  1) 

dv1 

who  re 

(86)  is  used 

to  compute 

P(xi_  1 lx‘- 

u'.v').  To 

sta  rt 

(Pit, 

set 

/ N+li  N 
P\v  1*  . 

N + 1 \ 

, u I = P 

( v N + 1 1 XN’ 

uN4l)  ° 

( vNt  1 

) 

• 

(92) 

where  6(v^j)  is  the  Dirac  delta  function. 

J 
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5. 


EXAMPLE  1 


The  amount  of  work  presently  involved  in  actually  solving  an  N 

stage  stochastic  game  is  tremendous.  For  this  reason,  example  1 

involves  only  a two- stage  scalar  game  since  this  serves  to  illustrate 

completely  how  the  N stage  vector  game  is  solved. 

All  random  variables  are  assumed  to  be  Gaussian.  Both  players 

know  the  mean  and  variance  of  each  random  variable.  Denoting  each 

2 2 

distribution  by  N(p,  n ),  where  u is  the  mean  and  o the  variance,  the 
required  a priori  probability  densities  are 

kn  : N ( '%2  ) 

\ ‘N  (»•■’!.  ) 

"i  :N(°-'V  ) 

) 

' 1 

v*,N(v°4) 

“2  : N (v  °u2  ) 

(v>  an(l  11  > here  are  merely  the  a priori  means  of  the  control  capability 
and,  for  most  problems,  would  be  zero.  ) Both  players  know  (91) 
through  (98). 


(93) 

(94) 

(95) 

(96) 

(971 

(98) 


66 


Using  (93)  through  (98)  and  the  formulas  developed  in  SectionlV, 


it  is  a straightforward  task  to  generate  the  required  conditional 
probability  densities. 

System  dynamics  and  observations  are  given  by  (1),  (2),  and 

(3)  with 

G.  = H.  = 1 (99) 

1 1 


To  write  out  the  conditional  densities  required  to  evaluate  the 
transformations,  a number  of  auxiliary  variab'  o are  defined: 


The  first  conditional  density  required  is  p(z  |x  , y , u^,  v^), 
which  is  needed  to  evaluate  Tjj  (or  Tjfe)  as  given  in  (31). 


T11:p(z1|x1.y,.u2.v2  )=  - 


G - A(L  + D+M) 

A 


| EGx2  + FGy,+ALx1+AMy1  + (AQ-OG)  u2+(AR-GP)  vyGN  | 
\ 1 G2-A(L+D+M)  ) 


or,  making  the  obvious  substitutions, 


T11:p(z1|x1,y1,u2,v2) 


G - A(L-fD-iM) 
A 


x |z j-a 1x2*a2y2'a3x i'a4y4'a5u2_a 6V2‘a7  | 


The  evaluation  of  T^7  is  simplified  by  considering  it  to  be  the 
product  of  two  other  transformations,  T and  T.2  . Since 


Ti2:p(y1U1,u2,v2  ) = p(y1lx1,y2,u2,v2)  P(y2|x1,u2,v2) 


it  follows  that 


_ - t 2 T 1 

J 12  " 12  12 
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where 


y2>  u2*v2  ) 

= Ti120(y  i 

) = (y j)  p(vi  I*1.  y2.  u2,v2 ) dyi 

(121) 

u2>  v2  ) = T 

12a(y2)  = 

fa  ly2)P(y2lxl-U2.v2)dy2 

(122) 

®;p(y  j 

I*1*  y2.  u2, 

’ V 2 L2  jj"  00  ’ “ ’ p(y2  ^ x lf  u2.v2)] 

(123) 

["“*  p(y2 

1 x . u2,  v2 

) J -*^2[-»,  »;p(v2|x1,u2)J 

(124) 

1 1 m[g2-A(L+D)] 

T12:p(ViIx  .y2'u2*v2)='-l * 

G4-A(L+D+M) 


( EGx.-fFGy , + A Lx  +( AQ-  DG)  u 4(AR-GP)v  +GN  )2 

'pi* = 1 = — 

( G - A(  L-(  D)  \ 


Cl  25) 


t12  spfyjx1.  y2.  u2.  v2) 


M [g2-A(L+D)J 
G2-  A(L  + D+M) 


' |yrn8X2*Q9y2-Q10Xl-allu2-a12v2-al3l 


(126) 


2 i v F [_G  +(F-A)(L*D) 


T i ? : p(y 7 1 x • u? ) = 


G - A(L4D) 


| ^ E(L+D)  x2+LGXj+  [gQ-(L  + D)  o]u24  [gR- ( L+D) p]v  , +(L+D)N  j2 

' 2 G24(F-A)(L+D)  i 

(127) 
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* 


or 


Tl22:p(y2(xl,u2)  = 


F [g2  + (F-A)(L+D)J 


G“-  A (L+D) 


X | y2‘a14X2'a  15Xra16U2'a  1 7V 2 * a 1 8 | 


(118) 


s[g2  + (F-A)(L+D)]  + b 2 LG2  + (F  - A )]  | 


T\y  p(v2Ix1  ' u2^ 


v2-  /-  LEPx. 


G"  + (F-A)(L+D)  ( 

- b , l[g'”  + (F-A)J  Xj  -a,  b.,  L,  G“  + (F  - A )1  u? 

+s[g2  + (F-A)(L+D)J  v2 

-b,  LGN  )/(S  [G2  + (F-A)(L+D)]+b2LLG2+(F-A)]  ) ! 


(119) 


or 


T 1 3 :p(v2  I X ’ u2^ 


s[g2  + (F- A)(L+D)]+b  2L  [g2  + (F-A)  dJ 


G +(F- A)(L4D) 


X | VZ*Q 19X2‘a20Xl "°2 lU2'a22 | 


(120) 


Defining  T*  and  T2^  analogous  ly  to  ( 1 17) 


Tl14sP(Xllx2*Y1-u2*v2)  = 


l[g2-  A(M  + D)] 
G - A(E+D  + M) 


I EGx2+FGV2+AMyi  + <A^-OG)uo+(AR-GP)u,-fGN 

{ X . + - — . 1 ^ c \ 


G -A(M+D) 


T 14 :p(X  1 I x2’  y1’  u2'  V2  ) = 


L , G 


:-A(M  + D)J 


2’  2 ' ~ 2 

G^-A(L+D+M) 


X |xi-Ci1x2-92y2-03yr04U2'05VrBb 


„2  , , 1 , e[g2  + (E-A)(M+D) 

T 1 4 :P(X2  y >u2  ’V2l=  2 

G - A(M+D) 


j F(M+D)y  +MGy  + [gQ- (M+D)o]u  +[gR  - (M+D)P\- 

K V 4- 1 ^ L 


* V 


G + (E-A)(M+D) 


or 


2 / , 1 i e[g2+(E-A)(M+D)] 

T14:p(x2ly  ,u2,vj=  2 


G - A(M+D) 


X|X2‘‘17>,2-Vre9u2-Sl0v2-eil!2 


(128) 


(129) 


2+(M+D)N  | 

I 

(130) 


(131) 
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and,  finally. 


. TC2+(E-A)(M+D)'+a?2M[c  2+(E-A)L  , 

T1  5: P(U2  I y ,VZ  ' = 2 ]u  - (-MFOy 

G +(E-A)(M+D)  ( 2 \ i 

-a2M[G2-(E-A)]  yj  -a2b2M[G2  + (E-A)]  v2  + t[g2  + (E-A)(M+D)]  u 


-a2MGN  V//t|G2+(E-A  )(M+D)]  + a2M  [g2+(E-A) 


,|  _2 


j TjG“  + (E-A)(M+D)]  + a2M  G2+(E-A) 

Ti5;p(u2ly  1 v2  ^ = 2 --- 

G + (E- A)  (M-t- D) 


(132) 


X | x2't!12y2'013yrei4v2'ei5  I 


(133) 


I 


It  is  now  possible  to  evaluate  u^  and  v j.  To  illustrate  how 
this  is  done,  consider  Uj  as  given  by  (47). 


T T 
1 131  12 


whe  r e 


1 x 1 i — t T 

J+  ,2  1 1 5 1 1 4 

erb  j Cj 


Tllzl  =05x2+96Xl+67U2+98 


e5  = a 1 +y4'7  8 
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(134) 


+<  y2  + y4"9*°  14+  _1’6  + 'r4a  12+^  'r2+  '4a9*  0 1 7]a  19  (135) 


6 6 = ri3+  '/4a  1 0+^  v2  + ^4n  9^  a 1 5 + _1*  6+y4a  1 2+^Y2+V4°  9^ 0 1 7]° 20  (136-) 


- — 


A 


(146) 


The  next  concern  is  with  the  elements  of  the  expansion  appear- 
ing in  the  Neumann  series.  Equation  (134)  indicates  that  the  transfor- 


mations  operate  on  e^  + e^Xj  + e^+Bg.  Actually  performing  the 
indicated  transformations  leads  to 

T13T12T15T14(95X2  + 96Xl+97U2  + e8)  = (95cp5+e6c2l  + e7cp9)  X2 

+(95O6+96»2+0^lo)xi+(e5lD7+96co3+97CBll)u2+95cp8+96CD4+9fp12+98 

(137) 


where 

9i0Q8+(”9+910a9>a14+[9ll  + 910a12  + (99+ei0a9)a17]a19 
°2~  6 10a9+^99+9  10a9*  0 15+  [9  1 1 + S 10a  12  + (99+910a9^  “'h]0  20  (139) 

°3  = 910a  1 1 + (99+9  10a9^  l6+[9  1 1+9  10°  12+(99  + 910a9*a17]a2  1 (140) 

'i4  = 9 12+9  1 0a  1 3+  ^ 9 9+  9 10a9^  18+[9  1 1+9  10a  12+la9+9  10a9'a  n]  a22  f141'1 

°9=e2+tsie7+(&4+Bl69)  ei2  CU2) 

°10=  03+BlB8+(64+BlB9)  9 1 3 (14-7) 

911=  e5+9iei0+(94+9l99)  9I4  C144) 

012=  e6+PlBll+(B4+ 9ie9}  915  (145) 
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(165) 


V5  013f0l6a19 


a>  9 •)  fl  a 

6 14  16u20 


V7  015+0l6n21 


';,8  ei740U,Q22 


9ir  (Wl  3hv[02+09012  + (08+09013)a9]a14 


0 1 4 (tV  0l)0l  3*°  10* 


0 1 5 ^08t0‘)0l  3^a  1 1 + 


[02+e9012  + (08+09013,Q9]a 
V09012  + (08+09013)Q9]a 


15 


It) 


0 1 (,  0 1 0*  090  1 4* ( 08+0‘>0  I 3>  ° 12+  [02+0^0  1 2+  ( *V  090  1 3 > a 9]  0 1 7 
0 1 7 


*1  0nV(012+013°9)o14'  [014+013°12+(0r2+0l  3°  Q > a w]* 


19 


®10  0M°  1O+(012+01  3n9,n15+  0 1 4*  0 1 3°  12+(0l2+013n9)a  1 7] a2( 


11  0l  3°  1 l^012f0l  3a9*n  U>+ 


0 1 4*  0 1 3n  12^012+013(19*"  1 7 J a 2 1 


® 1 2 B1540H^3+(012+013a9)al8+[014+013O12+(012+013a9)Q17]1 


22 


(166) 

(167) 

(168) 

(169) 

(170) 

(171) 

(172) 

(173) 

(174) 

(175) 

(176) 

(177) 


7(> 


2 ^ 

a 1 c 1 V' 

n = . y W1 0 (i) 

afc.+d.  8 

111  1=0 


The  game  optimal  strategy  for  the  minimizing  player  is  then 


given  by 


U1  = TTlVn2V1T3U2+n4 


(179) 


A similar  evaluation  of  (48)  leads  to 


V1  n5y2+T76yl+n7v2+TT8  (180 

At  this  point,  it  is  seen  that  the  game  optimal  strategies,  under 
the  assumptions  given,  are  linear  in  the  observations  and  past  controls. 
Because  of  the  complexity  of  the  computations  needed  to  achieve  (179) 
and  (180),  there  is  little  that  can  be  seen  in  the  way  of  structure  beyond 


linearity. 


Substituting  (179)  and  (180)  into  (64)  leads  to 


Y1  (Z2»  x2’  y2’  u2'  v2)  =Uo+cl|ulz2+u2X2+u3u2+u4y2+u5v2+lJ( 


, I i2 

+d,  a7x,+M.z,+unu,+u.nv.+u1, 


1 |u7''2'M8i'2TU9u2TU  1 0v2tu  1 1 ) 1 1 u 12y2+ul  3Z2+U 1 4u2+Lj  1 5V2+U1  bj 


(181) 


where 


Jo=(al2cl+dl)rT22^21-(erbl2ci)  ,,62ar21  + [ci(kl+alTr2+blr  6,2+dln22-el,T6  J 2 


(183) 


78 


(203) 


M14  = a2n6 


u15  = Vb2n6 


(204) 


u16  = 71 8 


(205) 


The  Value  of  the  game,  with  two  stages  to  go,  is  given  by  (66). 


J2  =cl°\1+c2aX22^o+^n  flC2lk2Z2+a2u2+h2v2)2+d2u?-e2vZ 


+cl(ulz2+u2x2+u3u2+M4y2+u5v2+M6) 


+dI(u7x2  + u8z2+u9u2  + M10v2+u11)‘ 


■el(Mi2y2+u13Z2+Mi4U2+Mi5v2+Ui6)‘ 


^ p(^7»  y2*  u2'  v 2^  ^^2*  ^2*  V 2.<  ^2*  ^ 2) 


(206) 


The  assumption  of  the  existence  of  pure  strategies 


~ u2  *x2* 

(207) 

= v2(y2) 

(208) 

and  the  satisfaction  of  the  saddlepoint  condition,  (68)  and  (69),  lead 
to  the  following  sufficient  conditions  for  game  optimal  strategies: 
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(a2c2k2+clulu3+dlu8u9'elu13u14)T22T21l2+(a22c2+d2+clu3+dlw9'elMl24)“2 
+ (a2b2c2+clu3M5+dlu9U10'e  lu  14m  1 5)T22v2+(clu  3M2+d  1U7M9)  x2 
+ (clu3M4'elu12M14)T22y2  + clM3u6+dlM9MirelM14u16  = 0 (196) 


(b2c2k2+clulM4+dlM8M10"elu13u15)T24T21z2+(a2b2c2+clM3u54dlu91J10 
'elU14u15)T24u2'(e2'b2  C2"  C1U  5'd  1M  fo4e  1M  lV  v2 
4*clu2u54dlu7M10*T24x24^clu4u5'elM12M15^  y2 


+ clu5u6+dlu10ul  1 e lu  1 5U  1 6 = 0 


(197) 


a22c2+d2+clu32+dl^VelUl24>0 


(198) 


e2-b22c2-clu52-dlul20+el^l25  > 0 


(199) 


where 


=/° 


tJ(x2.  y2>  = T2i  = /°(z?)  P(z?  lx?.  y,)  dz 


*2'  2 1 2’  y2' u 2 


T21:L2  [-“•  ‘°;p(z2  lx2»  y2)j  4 L2  f " “*  ":P(y2  I *2)] 

0(x2)  = T22  a(y2)  = fa  p(y2  I x2  ) dy2 


(200) 

(201) 

(202) 
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T22:L2  ["“•  ®:p(y2 

lx2)] 

■*  l2  £-«>.  p(x2 

M] 

(210) 

^2^y2^  = T24Q^X2^  = 

J a(> 

;2}  P(x2  1 y2  ) dx2 

(211) 

T24:L2[-“>  “;p(X2l 

I y2 ) 

-*  l2  °°;p(y2 1 

xz)_ 

(212) 

Equations  (202)  and  (203)  are  solved  simultaneously  to  yield  the 
game  optimal  strategies.  This  is  not  done  here  since  the  operator 
algebra  does  not  provide  any  fresh  insights. 

An  interesting  point  can  be  made  by  comparing  (203),  (204), 

(53),  and  (54).  Only  system  parameters  at  stage  1 entered  into  the 
latter,  whereas  system  parameters  at  stages  1 and  2 appear  in  the 
former.  Also,  it  appears  that  the  variances  of  the  observation  noises 
show  up  in  (20  3)  and  (204).  In  effect,  this  means  that  pure  strategies 
can  exist  only  where  the  dynamics  allow  them  and  when  the  observation 
noise  variances  make  the  obse rvations  meaningful.  The  exact 
dependence  can  be  made  clear  only  by  actually  evaluating  (203)  and 
(2  04). 

6.  EXAMPLE  2 

Example  1 indicated  that  the  game  optimal  controls  used  all 
information  available  to  each  player  to  determine  the  control  strategy. 
In  effect,  the  information  was  vised  to  better  define  the  state  at  each 
stage.  While  not  obvious,  the  key  piece  of  information  is  the  assumed 
a priori  distribution  representing  physical  limitations  on  the  opposing 
player's  available  control  magnitude.  These  assumptions  allow  a 
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player  to  generate  an  a priori  estimate  of  the  state  at  each  stage, 
which  is  then  combined  with  the  current  observation  to  produce  the 
a posteriori  estimate  of  the  state. 

What  happens  if  the  two  players  have  no  knowledge  of  their 
opponent's  capabilities  ? In  this  case,  the  a priori  control  distribution 
may  be  taken  as  one  which,  in  the  limit,  has  infinite  variance.  The 
result  is  a particularly  simple  separation  solution:  the  game  optimal 
control  strategy  at  each  stage,  except  the  Nth  where  an  a priori 
distribution  of  the  state  is  assumed  available,  is  the  deterministic 
control  strategy  with  current  observation  taking  the  place  of  the  true 
state.  At  the  stage,  a more  usual  strategy  (involving  noise 

variances)  is  used. 

To  show  that  such  is  actually  the  case,  a different  set  of  linear 
transformations  is  used  to  define  the  necessary  conditions.  Instead  of 
(25),  consider  the  following: 


where  the  independence  of  the  observation  noise  justifies  the  statement 
that  p(y  j | z j,  x \ y^,  u^,  = p(y  j I z j ) • Using  this  decomposition  of 

the  joint  density  does  not  lead  to  any  nice  characterization  of  the 
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resulting  conditional  densities  although  it  is  equally  valid.  Its  virtue, 
as  is  seen  later,  rests  in  the  conditioning  of  upon  x^  and  u^,  the 
information  available  to  player  II.  In  this  case,  the  past  is  discarded 
and,  because  of  the  decomposition,  the  actual  evaluation  of  the  func- 
tional form  of  the  control  strategies  becomes  almost  trivial. 

Using  (212),  the  control  strategies  are 

-i-  1 


U1  “ ' 2 


alClkl 


alCl+dl 


2,  2 2 
al  bl  C1 

XXX  np  ryi  rp  rp  rp  fp  rp 

2 w.  2 \ 114l3A121l  Ii18i17i16i5 


(al  cl+dl)(bl  Cl*el) 


14 


1 + 


bf  C1 

el’bfCl 


T T T T 
13  12  1 1 18 


(221) 


Vl  = 


blClkl 

,2 

e 1 " b 1 C 1 


2,2  2 

a b c 

I * 1 1 r'p  rp  'T  'T*  T*  T1  'T'  rT’ 

1_  2 , \/,  2 \ 18  17  16  15  14  13  12  11 


(a  ! c j+d  j)(b  j c j*e  j) 


- 1 


18 


1 - 


al  CI 
al cl+dl 


T T T T 
1 17  16  15  14 


(222) 


whe  re 


y2izi- x - u2-v2 * dy2 


B(z.r  x1,  u2,v2)  = T j j n ( y 2 ) = j y2)  p( 
e(z  j,  X1,  u2)  = T 12a(v2)  = y*o(v2)  p(v2  I z 1,x1,u2  ) d\’2 


(223) 

(224) 
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(213) 


(214) 


i2lzi>  y > v 2 ' d u2 


P(zj)  = TjjOlyj)  = j a(yj)  P (y J I z ! ) dyj  (213) 

^[x1,  u2)  = Tl4a(Zj)  = Ja( Zj)  p(zj  (x1,  u2  ) dzj  (214) 

8(z  j.  y1,  u2,v2)  = T15o(x2)  = j a(x2)p(x2|Zl>  y1,  u2>v2)dx2  (215) 

8(zr  y1,v2)=  Tl6a(u2)  = ya(u2)pju2|z1,y1,v2)du2  (216" 

6(zj)  = T 1?a(Xj)  = J a(Xj)  pjxjzj  ) dXj  (217 

e(y1.v2)=  T18a(zi)  = f*lz0  p(zjlyl>v2^  dzl  (218 

and  all  transformations  have  their  domain  and  range  in  the  appropriate 
Hilbert  space.  It  is  clear  that  these  are  not  the  same  transformations 
defined  in  (31)  through  (42),  although  they  are  an  equivalent  set. 

The  required  conditional  densities  are: 


B( z 1 ) = Tj  7 


(217) 


T 11 : p(y2  I z 1'  x ' u2'  v2  ) ~ * 


F^-AF 


Ex2+Gz  j +N- Ou  , -Pv'2  | 


1 , b,Z[c2,(F.A,D] 

12-p(v2  zl’  x ’ u2  g2  + (F-A)(L+D) 


1 2 

x \ v , - — z . - r — u , + 


?:  b2  r b2  2 2[g2  + (F  A)D]  2 b [g2  + (F.a)D_ 


(220) 
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(226) 


EZ-AE 


Fy 7 +Gz . +N-  Ou~  - Pv . 


T 1 6 : P\u2  I z 1’  y > v?  ' = 2 

^ 1 ^ G +<E-A)(M+D) 


u0-  — z . - v, 

2 a^  1 a^  2 


[g2+(E-A)d]  2 a [g2  + (E-A)d] 


The  conditional  densities  for  the  state,  (226)  and  (230),  involve 

only  the  observation  at  stage  1 because  the  entire  past  has  been  lost, 

as  it  were,  by  the  assumption  of  infinite  variance  on  the  opposing 

player's  control  at  stage  2,  In  concept,  this  is  similar  to  the  way  in 

39 

which  a Kalman  filter  is  initialized.  The  practical  result  is  to  place 
all  weighting  on  the  current  observation.  It  is  the  best  estimate  of  the 
state  at  that  point. 


(225) 


(229) 


(230) 


Now  consider  (213).  Using  (223)  through  (230),  it  follows  that 


14 


1 + 


bfCl 

.2 

erbiei 


T T T T 
13  12  11  18 
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1 + 


biS 

.2 

ej-bj  c 


(231.) 


T18T17T16T15T14T13T12T1 1 Z1  X1 


(232) 


so  that 


alClelkl 

U1  ' ‘ / 2 . . w ,2 


(ai%+di)(erb?ci)  + ai2bi2ci  1 


In  the  same  manner,  it  is  easy  to  show  that 


(233) 


blCldlkl 


(alCl+dl)(el-blCl)+afblCl 


(234) 


Note  that  (235)  and  (234)  are  precisely  the  control  strategies 
(detailed  in  Chapter  2)  for  the  deterministic  case.  They  could  have 
been  obtained  using  (47)  and  (48),  with  S and  T set  to  zero,  but  it 
would  have  been  difficult  to  obtain  the  results  in  closed  form. 

It  is  a straightforward  task  to  use  (64)  to  show  that 
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2 2 2,  2/  2 , . \ * ,2  2,2,  2/  ,2  \ 

T _ . .2  . . .2  . aicieiki(aici+di)°nrbIc1dik1  (erb,  cj )° 


2 ,2  2,2,2/  , 2 \ 2 


2”  1 X|  2 X2 


( a fc  i+di  )(e  rbi2c  i )+a  ib  \c  \ ] 


cidieiki 


„ 2 min  max 

0,  + 


(a  fc  l+dl  )(el"  bfcl)  +afbfcl  "2  “2  V2 


/ 


cldlelkl 


(a  r c i +d  i )(e  i - b r c j +a  fb  fc  f 


(k2z2+a2u2+b2v2 ) +d2u2'e2V2 


xp(z?1  u,,v  )d(z,,  u-,,  v,) 

£ 2 2 2 (235) 

The  game  optimal  control  strategies  for  (235)  are  easy  to  find  and  so 
are  not  derived  here. 

VII.  CONCLUDING  COMMENTS 

A major  factor  in  the  solution  of  the  multistage  stochastic 
differential  game  is  the  shared  knowledge  of  the  two  players.  Both 
players  know  the  value  for  all  parameters  of  the  dynamical  equations 
and  the  payoff.  All  density  functions  are  fully  known  to  both  sides. 
Given  this  type  of  structural  knowledge,  it  should  be  clear  that  other 
types  of  strategies  involving  other  information  sets  could  as  easily  be 
used. 


Admissible  control  strategies,  other  than  those  specified  by 
(17),  can  be  handled  in  a similar  manner.  The  main  difference, 
practically  speaking,  is  in  the  form  of  the  linear  transformations  that 
arise  from  a consideration  of  the  necessary  conditions.  Some 
examples  are  considered  in  Chapter  4. 
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SECTION  IV 


SINGLE-STAGE  STOCHASTIC  DIFFERENTIAL  GAMES 


I.  INTRODUCTION 

Chapter  3 derived  all  the  theory  required  to  solve  multistage 
stochastic  differential  games  having  pure  strategies.  This  chapter  is 
limited  to  examples  of  single-stage  scalar  games  involving  Gaussian 
random  variables.  Consequently,  the  theory  already  in  hand  is  used. 

This  chapter  has  two  purposes:  1)  to  show  that  the  solutions  to 
the  single-stage  game  have  a closed  form  solution  (which  may  or  may 
not  be  true  for  the  multistage  case),  and  2)  to  exhibit  the  game 
optimal  control  strategies  that  result  when  different  assumptions  are 
made  concerning  the  information  available  to  each  player  (different 
admissible  strategies). 

All  subscripts  referring  to  the  stage  number  are  absent  since 
there  can  be  no  ambiguity.  Further,  shorthand  notation,  which  is 
obvious  in  context,  is  introduced  as  required  for  convenience. 

Finally,  in  each  of  the  following  examples  the  maximizing 
player  (player  I)  is  assumed  to  have  only  noisy  observations  of  the 
state. 

2 

If  the  ca  term  is  neglected,  then 


min 
J = u 


max 

v 


/ 


( 2 2 2) 

c(kz+au+bv)  +du  -ev  p(z,  x,  y,  u,  v)  d(z,  x,  y,  u,  v) 


(1) 
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or,  since 


p(z  , x,  y,  u,  v)  = p(u,  v | z,  x,  y)  p(z,  x,  y) 


(2) 


J = j | c(kz+au+bv)2  + du2  - ev^  J p(z,  x,  y)  d(z,  x, 


y) 


(3) 


where  u and  v are  the  game  optimal  admissible  control  strategies. 
Defining 


, - ~ II  ~ 2 .^2  -2  ) 

I (u,  v)  - I j c(kz+au+bv)  + du  -ev  j p(z,  x, 


y)  d(z,  x,  y) 


(4) 


the  saddle  point  conditions  are 

l(u+e 6,  v)  - I(u,  v)  = 2 f J ackz+(a2c+d)  u + abcv  j cop(z,  x,  y)  d(z,  x,  y) 
+ J | a2c+d  J e262  p(z,  x,  y)  d(z,  x,  y)  «>  0 

I(u,  v+t-  j)  - I(u,  v)  - 2 J J bckz+abcu-  (e  - b2c)  v j e ,}p(z,  x,  y)  d(z,  x,  y) 


(S) 


/I  .2  | 22, 

| e * b c | t A p(z,  x,  y)  d(z,  x,  y)  x 0 (6) 


In  the  following,  it  is  assumed  that 


a c + d > 0 


(7) 


e- b c > 0 


(8) 


So  that  the  game  optimal  control  strategies  are  found  from  the  simul- 
taneous solution  of 


i 


I 
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/ J ackz  + (a2c  + d)u  + abcv  |e6p(z,  x,  y)  d(z,  x,  y)  = 0 f9) 

J j bckz  + abcu  - (e  - b2c)v  J e A p(z,  x,  y)d(z.  x,  y)  = 0 (10 

About  all  that  can  be  said  at  this  point  is  that  e is  a small 
number  (not  zero)  and  that  A is  any  real  function  of  y.  Until  admissi- 
ble strategies  are  defined  for  player  II,  it  is  impossible  to  say  what  6 
is  a function  of,  although,  in  all  cases,  it  is  assumed  to  be  a real 
quantity. 

As  in  Chapter  3,  it  is  assumed  that  both  players  know  the 
structure  of  the  game,  the  class  of  admissible  strategies,  the  values 
of  all  system  parameters,  and  the  mean  and  variances  of  all 
distributions. 


2.  EXAMPLE  1.  THE  MINIMIZING  PLAYER  HAS  NOISY 
OBSERVATIONS 

Example  1 is  the  single-stage  case  which  corresponds  to  the 
derivations  and  examples  of  Chapter  3.  In  this  case, 

p(u,  v I z,  x,  y)  = 6(u  - u(x))  6(v  - v(y))  ^ 

so  that  it  is  convenient  to  decompose  p(z,  x,  y)  into 

p(z,  x,  y)  = p(z|x,  y)  p(y  | x)  p(x)  r 


p(z,  x,  y)  p( z | x,  y ) p(x  | y ) p( y ) 


Defining  the  following  linear  transformations 
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(14) 


0(2)  p(z|x,  y)  dz 


0(x,  y)  = T jO( z)  - f a(z)  p(z|x, 
S(x)  = t 2a(y)  -J a(y)p(yU)dy 
B(y)  = T3a(x)  = J 0(x)p(x | y)  dx 


(with  suitable  domain  and  range,  of  course)  the  necessary  conditions 
for  game  optimal  control  strategies  are,  (9)  and  (10), 


ackT^TjZ  + fa  c+d)u  + abcT^v  = 0 
bckT^T  jZ  + abcT^u  - (e  -b^e)v  = 0 


so  that 
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The  required  conditional  densities  are 
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U-wT2T3r‘T2|u-^T3Jv  . 6,0xt8u<W[e10(eI2x+e13)t811) 
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910  + 911  . W910ei3 

1-W912  1-W  (1-W)(1-W012) 


(34) 


so  that 


ack  f 910  911 


3 W910ei3  ' 

(1-W)(1-W012) 


(35) 


96 


whe  re 


(Note  that 


|W9I2I<1 


SO  that  the  infinite  series  involving  W012  in  (34)  converge 
sum  being  given  by  a closed  form  solution.) 

A similar  exercise  leads  to 
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bck  r eH  ^_i_5 

e-b2c  [1  -we16  y ^ 


+ W914917  1 
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(37) 

(38) 

(39) 

(40) 

with  the 

(41) 
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where 


14 


2 
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16 


(44) 


"'kSVVR' 


m°n 

”Z — 2 

a +o  ‘ 

z h 


(45) 


A comparison  of  (35)  and  (42)  with  the  example  of  a single- 
stage  game,  with  control  strategies  specified  to  be  linear  (Reference 
40),  shows  that  (except  for  a typographical  error)  the  two  solutions 
are  identical  if  the  mean  of  the  a priori  estimate  of  the  state  is  zero 
(m  = 0).  However,  this  example  proves,  by  construction  of  the  solu- 
tion, that  (under  the  assumptions  of  linear  dynamics,  quadratic 
payoff  function,  Gaussian  random  variables,  and  noisy  observations 
for  both  players)  there  is  no  nonlinear  pure  control  strategy  that 
can  do  bette  r . 

Consider  what  happens  when  player  I is  unable  to  affect  the 
outcome'  of  the  game  (b  = 0).  In  this  case,  the  single-stage  game 
degenerates  to  a single-stage  minimization  problem  with  solution 
given  by 
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u 


ack 


(46) 


where  the  terms  within  the  parentheses  comprise  the  minimum  vari- 
ance estimate  of  the  true  state  — an  example  of  the  well-known  separa- 
tion property  of  this  class  of  stochastic  control  problems.  ^6  Note 
that  this  happy  situation  does  not  exist  in  (35)  since  system  parameters 
and  both  piayers'  observation  noise  variances  are  inextricably  mixed 
together.  In  other  words,  even  in  this  simplest  of  stochastic  differ- 
ential games,  a separation  theorem  does  not  exist. 

Setting  the  observation  noise  variances  to  zero  (perfect 
observations)  leads  directly  to  the  deterministic  solution  found  in 
Chapter  2. 

If  only  one  of  the  players,  say  player  II,  has  perfect  observa- 
tions, then  the  control  strategies  are  still  given  by  (35)  and  (41), 
except  that 


(47) 


In  this  case,  player  II 1 s control  strategy  is  not  identical  with  the 
deterministic  one  of  Chapter  2.  Player  II 1 s game  optimal  strategy 
still  involves  terms  reflecting  the  noisy  nature  of  player  I's  observa- 
tions. Again,  there  is  no  separation  theorem. 
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3.  EXAMPLE  II.  THE  MINIMIZING  PLAYER  HAS  PERFECT 
OBSERVATIONS  AND  KNC  'VS  THE  MAXIMIZING  PLAYER'S 
OBSERVATION 

I his  single-stage  game  corresponds,  in  terms  of  information 
content,  to  the  work  presented  in  Reference  41.  In  this  case, 

p(u,  v | z,  x,  y)  6(u  - u(z,  y))  6(v  - v(y)) 

(48J 

where  x has  been  dropped  from  consideration  since  x is  identical  to 
z at  all  times  (perfect  information).  Instead  of  (13),  the  remaining 
joint  density  can  be  written 


p(z,  y)  p(z|  y)  p(y ) 


(49) 


so  that  only  one  linear  transformation  is  required 
e(y)  - TOL(z)  = f 0-(  z)  p(z  | y)  dz 
which  results  in 


(50) 


ackz  + (a  c + d)  u + abcv  - 0 


bckTz  + abcTu  - (e  - be)  = 0 


(51) 

(52) 


Solving  (5  1)  and  (52)  simultaneously  leads  to 
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a be 
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bedk 
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bedk 

2 * 5 — ■> — } T z 

(a  c+d)(e  -b  c)  + a b^c 


(5  n 
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Since  the  conditional  moan  of  z given  y is  precisely  the 
minimum  variance  estimate  of  the  state,  z.  (54)  indicates  that,  at 
last,  there  is  a separation  theorem  for  the  maximizing  player.  If 
the  error,  z.  between  the  true  state,  z,  and  the  best  estimate  of  the 

A 

state,  z,  is  introduced,  then  (53)  can  be  rewritten  as 

u : - _ acdk 2 + abc bedk ~ 

(a"c  + d)(e-b2c)  + a2b2c2  a2c  + d (a2c+d)(e  -b2c)4a2b2c2 


where 


z = z - z 


Equat  ion  (55)  shows  that  player  II's  optimal  control  strategy 
can  be  broken  into  parts:  one  part  which  is  identical  to  that  used  in 
the  fully  deterministic  case,  and  a second  part  which  is  proportional 
to  the  error  in  player  I's  estimate  of  the  true  state.  As  usual,  a 
linear  control  strategy  results. 

This  example  could  be  extended  to  the  multistage  case,  if 
desired,  but  the  results  would  not  match  those  obtained  in  Reference 
41.  Even  though  both  players  have  linear  control  strategies  under 
either  formulation,  the  imposition  of  the  requirement  that  the  strate- 
gies be  linear  changes  the  essential  character  of  the  solution;  a great 
deal  more  information  is  available  to  both  players  if  they  know  the 
form  (the  structure)  of  the  strategies.  In  effect,  the  variance  of  the 
estimate  of  the  opposing  player's  past  controls  is  reduced  since 
mere  capability  must  no  longer  be  considered  alone.  Instead,  the 
estimate  depends  on  the  ability  of  each  player  to  estimate  his 
opponent's  observation  — a situation  which  is  much  easier  to  handle. 


SECTION  V 


MULTISTAGE  STOCHASTIC  DIFFERENTIAL  GAMES  WITH 
SPECIFIED  CONTROL  STRATEGIES 


I.  INTRODUCTION 

This  chapter  investigates  the  case  of  a multistage  stochastic 
differential  game  wherein  both  players  have  only  noisy  observations  of 
the  true  state.  Unlike  the  work  presented  in  Chapter  3,  the  form  of 
the  control  strategies  for  both  sides  is  specified.  In  particular,  the 
strategies  are  specified  to  be  linear  function.*  of  the  present  and  p<r,- 
observations  available  to  each  player. 

The  assumption  of  the  form  of  the  control  strategy  has  a rru  jcr 
impact  on  the  method  of  solution  required.  Previously,  the  problem 
was  one  of  functional  optimization  over  the  class  of  all  strategies,  and 
the  methods  of  functional  analysis  were  used;  now,  the  problem  is 
reduced  to  optimization  over  a set  of  parameters,  and  the  ordinary 
calculus  suffices  . 

The  method  is  best  presented  by  performing  a two-stage 
example.  The  extension  to  an  N stage  game  is  then  obvious. 

II.  DERIVATION  OF  THE  GAME  OPTIMAL  LINEAR  STRATEGIES 

The  problem  is  to  choose  a set  of  parameters  in  the  pure 
control  strategies  which  optimize  (in  a game  sense)  a quadratic 
payoff  functional 
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e.  v. 
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(The  payoff  function  is  written  in  terms  of  scalar  states  and  control 
variables  for  the  convenience  of  doing  an  example  problem,  but 
everything  to  be  said  goes  over  immediately  to  the  vector  case.) 
The  tilde  over  the  two  control  strategies,  u.  and  v . , is  meant  to 
denote  that  the  control  strategies  are  restricted  to  those  having  a 
certain  form,  which  is  linear  in  this  example.  The  min  and  max 
operators  are  to  be  evaluated  over  the  set  of  parameters  a^,  a^, 

ai2’  a2’  and  a22  and  01T  P12’  ^2  Cdenoted  by  a and  0.  respec- 
tively) since 
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a.  .x. 
*J  J 
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(2) 


(3) 


Thus,  solving  the  last  stage  game  first, 


T _ nun  max  I \ 2,  , ~2  ~2  | , ~ ~ ~ ~ 

J 1 ' CL  4 J |ClZo+dlUl  ‘elVl  I P<*„*  Ul>  Vl>d(VUl,Vl> 


' P(z,.  Uj.  Vjld*/,,.  u,,  v,|  = c,c^  t rn^n  mx*  J J c | ^k|Zj 

* .|(a1+a11x1+a|2xi)tbl(91+PllVlts12V2,2]  t djLdj+a^Xj 


* J12X2lZ  'l'l''iSl*t,llvrei2V2^2!  p(»r*‘.  y'.u2.  v2) 


1 1 ~ ~ 
d(z  j,  x , y , u2>  v2  ) 


"h‘‘ro  a and  8 in  (4)  arr  the  set  ajt  an,  a^,  8j,  ,-'n>  and  8 


rospr  ctively. 


Since  (4)  is  a problem  in  ordinary  calculus,  the  usual  suf- 
ficient conditions  for  the  minimi /.at  ion  and  maximntion  of  a function 
of  several  variables  are  applicable,  namely 
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^here  (5)  and  (6)  are  vector  equations  and  (7)  and  (8)  are  matrix 
equat  ions . 

Applying  (5)  and  (6)  to  (4)  yields  a set  of  linear  algebraic 
equations  in  a and  F,  the  optimal  values  for  d and  8 

dcTj  = 2y!alcl[klzl  + al(“'l+ailxl  + ai2X2)  + bl(fri'f?rilyl  + ^12y2)] 

+ di[vsnxi+5:i2x2]i  p(zrxl.  yl.  ^.^jdjzi^V.  u2.  v2)  =0 


= 2/|alClXl[klZl  + al(ai  + aHXl  + ai2X2)  + bl(?l  + Sllyl  + C,12y2)] 

+ dlxl[^l+°’llxl  + ai2^  jrtzi*  xl.y1.u2*  v2)  d(zj,  x1,  y v2)  = 0 


= 2 / a 
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Since 


p(z1  • x1  , y1  ,u2  . V2)  J p(\j  | Zj  > p(y  j | Zj  ) p(z 


1 1 z2’  w 


x p(z2  ■ x2  ’ y2  ’ ^2  ’ v2  ^ dz2 


* 


under  the  assumptions  of  independent  observation  and  process  noise, 
(9)  through  (15)  can  be  rewritten  as 
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blclklE(k2z2fa2a2+b2v2)  + aib1CijTi+E(k2z2+a2u2  + b272)an+E(x2)ai2[ 

‘ (el‘bl  cl)|  VE(k2Va2Vb2VFll+E(VF12i  = 0 (19) 
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whe  re 
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At  this  point,  it  can  be  seen  that  , OLj  j , a^.  Fj  ,F,  ^ , and 
3 J2  can  be  found,  in  terms  of  system  parameters  and  the  expected 
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values  of  quantities  appearing  at  stage  2,  from  the  simultaneous  solu- 
tion of  six  linear  equations,  (16)  through  (21),  in  six  unknowns. 
Because  of  the  amount  of  algebra  involved,  no  attempt  has  been  made 
to  solve  the  equations  explicitly. 

Having  actually  solved  the  equations,  the  principle  of  opti- 
mality is  used  to  find  the  game  optimal  values  for  C^,  a^,  8 2>  and  822 , 


min  max  , 2 ^2  ~ 2 1 

J2  = a 8 E .c^  + d2u2  - e2v2  + Jj  ( 
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and 


P(*,.xlpkl,u2,  v2)=  p(Xj  | Z[)  p(yi  | Zj) 

* PUj  U2.u2,v2)  6(u2  -a2  -a22x2) 
x 6(v2  - 92  - 922  y2)p(x2|  z2)  p(y2U2)  p(z2> 

C25) 

Substituting  (24)  and  (25)  into  (23)  and  then  carrying  out  all  the 
indicated  integrations  lead  to  a characterization  of  J2  in  terms  of 
a2'  a22’  ^2 ’ and  S22"  A8ain  setting  the  first  partial  derivatives  with 
respect  to  each  of  the  four  variables  equal  to  zero,  along  with  the 
required  positive  definiteness  of  the  matrix  of  second  partials  with 
respect  to  a2  and  a22  and  the  negative  definiteness  of  the  matrix  of 
second  partials  with  respect  to  and  leads  to  the  optimal 

values  for  a 2>  a22>  and  ^22  in  terms  of  system  parameters  and 
the  a prion  variances  and  means  of  the  various  random  variables. 

3.  CONCLUDING  COMMENTS 

The  specification  of  a certain  form  for  the  control  strategies 
reduces  the  conceptual  difficulties  associated  with  solving  multistage 
stochastic  differential  games,  but  it  does  little  to  reduce  the  difficulty 
of  actually  finding  the  correct  values  for  the  control  strategy  parame- 
ters. For  instance,  at  stage  i of  an  N stage  game,  there  are 
2 (N-i  + 2)  control  strategy  parameters  to  be  found  by  solving  a like 
number  of  simultaneous  equations  (and,  at  this  point,  there  is  no  way 
of  telling  whether  the  y(  resulting  from  optimization  at  the  ith  stage 
leads  to  an  equation  which  is  quadratic  in  a and  S at  the  i + 1 st  stage). 
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Having  found  A and  F at  the  i stage,  it  is  still  necessary  to  investi- 
gate the  eigenvalues  of  two  N - i + 2 by  N - i + 2 matrices  which  may  or 
may  not  be  functions  of  a and  F". 

Nevertheless,  a straightforward  method  of  actually  solving  a 
multistage  stochastic  differential  game  having  pure  strategies  has 
been  developed.  It  is  at  least  possible,  though  tedious,  to  solve  such 
a game  analytically  and  thus  determine  the  effects  of  various  values 
in  system  parameters  and  in  the  a priori  distributions. 

This  work  may  be  compared  to  that  presented  in  Reference  40 
which,  in  part,  solves  the  same  problem.  A major  difference 
between  the  two  is  in  the  handling  of  the  various  conditional  densities 
that  arise.  In  Reference  40,  they  are  summarized  in  terms  of 
Kalman  filters,  while  here  they  are  introduced  directly.  It  appears 
that  recursive  filtering,  while  probably  leading  to  the  same  answer, 
adds  a fair  measure  of  both  conceptual  and  practical  difficulties. 

Finally,  there  is  the  question  of  the  relationship  between  the 
multistage  stochastic  differential  games  when  the  control  strategies 
are  and  are  not  specified  as  to  form.  Since  neither  closed  form 
analytical  nor  numerical  results  are  available,  one  can  only  specu- 
late as  to  the  differences  in  the  payoff. 

The  resulting  linear  control  strategies  (with  and  without  a 
linear  strategy  being  prescribed  a priori)  are  not  identical.  The 
conclusion  is  that  the  control  strategy,  which  is  optimal  over  the  set 
of  all  linear  strategies,  is  not  equivalent  to  the  control  strategy 
(also  linear)  which  is  optimal  over  any  control  strategies.  This 
seeming  contradiction  can,  however,  be  resolved. 


Ill 


The  reason  for  the  difference  between  the  two  strategies  is 
precisely  the  difference  of  the  information  available.  In  both  cases, 
all  available  information  is  used.  In  this  sense,  the  knowledge  that 
an  opponent  is  limited  to  using  only  one  form  of  a solution  is  merely 
an  additional  piece  of  information.  Thus,  just  as  changes  in  infor- 
mation led  to  different  linear  strategies  in  the  examples  in  Chapter  4. 

so  too  do  changes  in  information  in  multistage  games  lead  to  changes 
in  strategies . 

This  is  an  example  of  the  difference  between  stochastic 
differential  games  and  stochastic  optimal  control.  Unlike  stochastic 
optimal  control,  there  is  no  separation  between  estimation  of  the 
state  and  control.  And,  as  noted  above,  even  for  the  case  of  linear, 
deterministic  dynamics,  quadratic  payoff  functions , and  Gaussian 
random  variables,  the  optimal  strategy,  over  all  strategies,  is 
linear  but  not  equal  to  the  optimal  linear  strategy. 

By  setting  the  appropriate  quantities  to  zero,  optimal  control 
problems  may  be  considered  to  be  special  cases  of  differential  games; 
the  same  statement  is  not  true  in  reverse.  Differential  games  are 
not,  in  general,  mere  extensions  of  optimal  control. 
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SECTION  VI 


FUTURE  WORK 

The  work  presented  in  this  dissertation  leads  one  inevitably  to 
consider  future  areas  of  research. 

For  games  of  perfect  information,  the  question  of  what  combina- 
tions of  payoff  function  and  dynamics  lead  to  pure  strategies  is  a 
natural  one  to  ask.  It  also  would  be  useful  to  know  under  what  condi- 
tions randomized  strategies  exist  and  how  they  are  to  be  found. 

The  corresponding  questions  for  continuous  time  games  are 
also  worth  asking.  Does  it,  in  fact,  make  any  sense  to  talk  about 
randomized  strategies  when  a new  control  must  be  chosen  at  every 
instant  of  time? 

Much  work  remains  to  be  done  for  stochastic  games  Simple 
extensions  of  the  work  done  herein  would  include  the  closed  form  solu- 
tion, if  one  exists,  for  the  multistage  vector  game  of  Chapter  3. 
Numerical  solutions  should  be  of  interest  in  any  event. 

Also,  the  solution  to  continuous  time  differential  games,  of 
the  type  studied  in  Chapter  3,  would  be  interesting.  It  is  not 
immediately  clear  that  the  same  use  of  conditional  probability  densities 
and  simple  linear  operators  would  produce  answers. 

Still,  in  the  realm  of  pure  strategies,  it  would  be  viseful  to 
extend  the  results  to  nonlinear  problems  and  to  payoff  functions  that 
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are  not  quadratic.  With  regard  to  randomized  strategies,  is  it 
possible  to  apply  these  techniques  to  stochastic  games  or  must  new 
ones  be  developed? 


A great  deal  of  information  concerning  the  structure  of  the 
problem  is  assumed  available  to  both  players.  Further  work  might 
consider  the  effect  of  less  information  or  information  in  the  form  of 
probability  densities  In  the  same  vein,  it  would  be  interesting  to 
know  if  there  is  a suitable  corollary  to  adaptive  control  in  the  game 
situation 

Obviously,  there  is  a great  deal  of  work  yet  to  be  done. 
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APPENDIX 


The  following  two  theorems  are  taken  almost  directly  from 
Chapter  12  of  Reference  20,  with  appropriate  changes  in  rotation. 


Theorem  I (12,2):  Let  1^  be  the  payoff  function  of  a continuous  game, 

and  suppose  that  I.  is  continuous  in  both  variables  and  that  I.(u..v.) 

i l i l 

is  strictly  convex  in  u.  for  every  v..  Then  there  is  a unique  optimal 
strategy  for  the  second  player,  which  is  a step  function  of  first  order; 
i.e.,  there  is  a number  ,u.  in  the  closed  interval  [-1,  l]  such  that  the 
(unique)  optimal  strategy  for  the  second  player  is  the  step  function 
H(u.).  (H(u.)  is  a probability  density  function).  The  Value  J.  of  the 
game  is  given  by  the  formula 

T min  max  T . . 

Ji  " -Uv.il  -Uu.il  Ii(ui,vi) 
l i 


and  the  constant  u^  is  the  unique  solution  of  the  equation 


max  . — 

, I . (v . , u . ) = J. 
liv.il  1 l’  1 1 


Theorem  I provides  a means  for  finding  the  minimizing 
player's  game  optimal  control  strategy  and  the  Value  of  the  game.  The 
following  theorem  does  the  same  for  the  maximizing  player. 


Theorem  II  (12,5):  Let  I.  be  the  payoff  function  of  a continuous  game, 

and  suppose  that  I.  is  continuous  in  both  variables,  that  AI . (u  , v . ) /du. 

i i i*  r i 

exists  for  each  ui  and  v . in  [ - 1,  1 ] x [ - 1,  1 ],  and  that  I.(u.,  v.)  is  a 


H 

| 


I 


t- 

i 

H 


l 

i 
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strictly  convex  function  of  for  each  v..  Let  H(u^)  be  the  unique 
optimal  strategy  for  the  second  player,  and  let  J.  be  the  Value  of  the 
game.  If  u.  = - 1 or  1,  then  there  is  an  optimal  strategy  H(v^)  for  the 
first  player;  the  constant  can  be  taken  to  be  any  number  satisfying 
the  conditions 

OSv^l  , 


!i<V 


vi>  = 


J. 

l 


31. 

l. 

3u. 

i 


v. 

l 


if  u.  = - 1 

l 

if  u.  = 1 


If  -1<Uj<  1,  then  there  is  an  optimal  strategy  for  the  first  player, 
which  has  the  form 

aHI^1)  + (1  - a)  H(v.2) 

and  the  constants  a,  v.* , and  v.2  can  be  taken  to  be  any  numbers 
satisfying  the  conditions 


-lsv.1  * 1 


-1  Sv.1  S 1 
1 


Osas  l 


Vui 


v.l)  = J. 
1 ' 1 


Ii<ui-  vi2) 


J. 

1 


31. 

i 


* 0 


31. 
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